Together AI
inference_host
155 signals tracked
Together AI hires Alon Gavrielov as VP of Infrastructure Strategy
Hiring Alon Gavrielov further deepens Together AI’s commitment to building AI factories that deliver the most reliable, efficient, and scalable infrastructure for AI-native teams.
Date not specified
InfoCapabilityTogether AI releases Rime Arcana V3 Turbo and V3 for multilingual voice AI
Date not specified
HighCapabilityLLMs Reveal Hidden Knowledge Priors in Unconstrained Generation
What do language models generate when you don't tell them what to generate? New research reveals that LLM families have distinct 'knowledge priors'—GPT models default to code and math, Llama favors narratives, DeepSeek generates religious content, and Qwen outputs exam questions.
Date not specified
HighCapabilityTogether AI launches Dedicated Container Inference for 2.6x faster AI model inference
Together AI launches production-grade orchestration for custom AI models with 1.4x–2.6x faster inference.
Date not specified
HighCapabilityOpenAI Consistency Diffusion Language Models (CDLM) - Up to 14x Faster Inference
Standard diffusion language models can't use KV caching and need too many refinement steps to be practical. CDLM fixes both with a post-training recipe that enables exact block-wise KV caching and trajectory-consistent step reduction — delivering up to 14.5x latency improvements
Date not specified
HighCapabilityTogether AI: Speech Models Fail on Street Names - Synthetic Data Fix
State-of-the-art speech models like Whisper and Deepgram score near-human on benchmarks — then fail 39% of the time on street names. New research from Together AI exposes the gap and a fix.
Date not specified
HighCapabilityOpen inMute releases CoderForge-Preview: SOTA open dataset for training coding agents
Date not specified
CriticalCapabilityTogether AI introduces new visual identity
We've refreshed our visual identity — designed with Pentagram to express how Together AI connects open-source innovation, systems research, and builders to unlock new possibilities.
Date not specified
InfoCapabilityTogether AI introduces Cache-aware prefill–decode disaggregation (CPD) for faster LLM serving
Serving long prompts doesn't have to mean slow responses. Learn how Together AI's CPD architecture separates warm and cold inference workloads to deliver 40% higher throughput and dramatically lower time-to-first-token for long-context LLM serving.
Date not specified
HighCapabilityOpenAI deprecating GPT-3.5 Turbo by June 2025 — migration to GPT-4o-mini required
As GPU throughput outpaces memory bandwidth, kernels must evolve. We introduce FlashAttention-4, featuring new pipelining for maximum overlap, 2-CTA MMA modes to reduce shared memory traffic, and a hardware-software hybrid approach to softmax exponentials.
Date not specified
HighDeprecationTogether AI Announces AI Native Cloud Breakthroughs
At AI Native Conf, Together AI announced breakthroughs across kernels, RL, and inference optimization — including FlashAttention-4, ThunderAgent, and together.compile. Research that ships to production. That's the AI Native Cloud.
Date not specified
HighCapabilityTogether GPU Clusters: Autoscaling, Observability, and Self-Healing Added
Together GPU Clusters now include built-in autoscaling, RBAC, full-stack observability, and self-healing node repair—giving teams production-ready GPU infrastructure that scales efficiently, stays resilient, and supports shared enterprise workloads.
Date not specified
HighCapabilityTogether AI Launches NVIDIA Nemotron 3 Super on Dedicated Inference
NVIDIA Nemotron 3 Super is now available on Together AI Dedicated Inference, delivering efficient multi-agent reasoning, a 1M-token context window, and production-grade deployment on managed infrastructure.
Date not specified
HighCapabilityTogether AI launches unified real-time voice agent platform
Build real-time voice agents on Together AI with co-located STT, LLM, and TTS infrastructure, native Deepgram and Cartesia support, and end-to-end latency under 500ms.
Date not specified
HighCapabilityTogether AI announces new AI innovations at NVIDIA GTC 2026
Together AI arrives at NVIDIA GTC 2026 with new launches in inference, agents, voice AI, and open models — plus technical sessions from its research and engineering leaders.
Date not specified
InfoCapabilityOpenAI releases Mamba-3: Faster SSM for Inference
Meet Mamba-3: the SSM built for inference. Faster than Transformers at decode, stronger than Mamba-2, and open-source from day one.
Date not specified
HighCapabilityTogether AI expands fine-tuning with tool calling, reasoning, and vision support
Together AI expands fine-tuning with native support for tool call, reasoning, and vision-language models, plus 100B+ model training, up to 6× higher throughput, and job cost and ETA estimates.
Date not specified
HighCapabilityNew Research: Smaller LLMs Outperform GPT-4o on Long Context Tasks
As context windows grow, LLM performance degrades in unexpected ways. We show how a "Divide & Conquer" framework — breaking long documents into parallel chunks with a planner, workers, and manager — lets smaller models like Llama-3-70B and Qwen-72B outperform GPT-4o single-shot.
Date not specified
HighCapabilityOpenAI Releases Aurora: RL Framework for Continuous Speculative Decoding
1.25x over a well-trained static speculator. Aurora is an open-source RL framework that turns speculative decoding from a one-time offline setup into a self-improving system that learns from every request it serves.
Date not specified
HighCapabilityTogether AI Announces Kernel Research Team and Blackwell GPU Optimization
The team behind FlashAttention and ThunderKittens — how Together AI's kernel researchers close the gap between GPU hardware and production AI.
Date not specified
HighCapability
Get alerts for Together AI
Never miss a breaking change. SignalBreak monitors Together AI and dozens of other AI providers in real time.
Sign up free — no credit card required