Together AI

inference_host

255 signals tracked

Website

Together AI announces new innovations at NVIDIA GTC 2026
Together AI arrives at NVIDIA GTC 2026 with new launches in inference, agents, voice AI, and open models — plus technical sessions from its research and engineering leaders.
16 Mar 2026
InfoCapability
Together AI launches unified real-time voice agent platform
Build real-time voice agents on Together AI with co-located STT, LLM, and TTS infrastructure, native Deepgram and Cartesia support, and end-to-end latency under 500ms.
12 Mar 2026
HighCapability
Together AI launches unified platform for real-time voice agents
Build real-time voice agents on Together AI with co-located STT, LLM, and TTS infrastructure, native Deepgram and Cartesia support, and end-to-end latency under 500ms.
12 Mar 2026
HighCapability
Together AI Launches NVIDIA Nemotron 3 Super on Dedicated Inference
NVIDIA Nemotron 3 Super is now available on Together AI Dedicated Inference, delivering efficient multi-agent reasoning, a 1M-token context window, and production-grade deployment on managed infrastructure.
11 Mar 2026
HighCapability
Together AI Launches NVIDIA Nemotron 3 Super on Dedicated Inference
NVIDIA Nemotron 3 Super is now available on Together AI Dedicated Inference, delivering efficient multi-agent reasoning, a 1M-token context window, and production-grade deployment on managed infrastructure.
11 Mar 2026
HighCapability
Together GPU Clusters: Autoscaling, Observability, and Self-Healing Added
Together GPU Clusters now include built-in autoscaling, RBAC, full-stack observability, and self-healing node repair—giving teams production-ready GPU infrastructure that scales efficiently, stays resilient, and supports shared enterprise workloads.
10 Mar 2026
HighCapability
Together AI introduces autoscaling, observability, and self-healing to GPU Clusters
Together GPU Clusters now include built-in autoscaling, RBAC, full-stack observability, and self-healing node repair—giving teams production-ready GPU infrastructure that scales efficiently, stays resilient, and supports shared enterprise workloads.
10 Mar 2026
HighCapability
Together AI Announces AI Native Cloud Breakthroughs at AI Native Conf
At AI Native Conf, Together AI announced breakthroughs across kernels, RL, and inference optimization — including FlashAttention-4, ThunderAgent, and together.compile. Research that ships to production. That's the AI Native Cloud.
5 Mar 2026
HighCapability
FlashAttention-4: Algorithm and Kernel Pipelining Co-Design for Asymmetric Hardware Scaling
As GPU throughput outpaces memory bandwidth, kernels must evolve. We introduce FlashAttention-4, featuring new pipelining for maximum overlap, 2-CTA MMA modes to reduce shared memory traffic, and a hardware-software hybrid approach to softmax exponentials.
5 Mar 2026
InfoOther
Key research and product announcements at the AI Native Conf
At AI Native Conf, Together AI announced breakthroughs across kernels, RL, and inference optimization — including FlashAttention-4, ThunderAgent, and together.compile. Research that ships to production. That's the AI Native Cloud.
5 Mar 2026
InfoOther
Together AI Announces AI Native Cloud Breakthroughs
At AI Native Conf, Together AI announced breakthroughs across kernels, RL, and inference optimization — including FlashAttention-4, ThunderAgent, and together.compile. Research that ships to production. That's the AI Native Cloud.
5 Mar 2026
HighCapability
Together AI introduces Cache-aware prefill–decode disaggregation (CPD) for faster LLM serving
Serving long prompts doesn't have to mean slow responses. Learn how Together AI's CPD architecture separates warm and cold inference workloads to deliver 40% higher throughput and dramatically lower time-to-first-token for long-context LLM serving.
4 Mar 2026
HighCapability
Cache-aware prefill–decode disaggregation (CPD) for up to 40% faster long-context LLM serving
Serving long prompts doesn't have to mean slow responses. Learn how Together AI's CPD architecture separates warm and cold inference workloads to deliver 40% higher throughput and dramatically lower time-to-first-token for long-context LLM serving.
4 Mar 2026
InfoCapability
Together AI introduces Cache-aware prefill–decode disaggregation (CPD) for faster LLM serving
Serving long prompts doesn't have to mean slow responses. Learn how Together AI's CPD architecture separates warm and cold inference workloads to deliver 40% higher throughput and dramatically lower time-to-first-token for long-context LLM serving.
4 Mar 2026
HighCapability
Introducing Together AI’s new look
We've refreshed our visual identity — designed with Pentagram to express how Together AI connects open-source innovation, systems research, and builders to unlock new possibilities.
2 Mar 2026
InfoCapability
Together AI introduces new visual identity
We've refreshed our visual identity — designed with Pentagram to express how Together AI connects open-source innovation, systems research, and builders to unlock new possibilities.
2 Mar 2026
InfoCapability
Together AI introduces new visual identity
We've refreshed our visual identity — designed with Pentagram to express how Together AI connects open-source innovation, systems research, and builders to unlock new possibilities.
2 Mar 2026
InfoCapability
Open inMute releases CoderForge-Preview: SOTA open dataset for training coding agents
25 Feb 2026
HighCapability
CoderForge-Preview: SOTA open dataset for training efficient coding agents
25 Feb 2026
InfoCapability
Open inMute releases CoderForge-Preview: SOTA open dataset for training coding agents
25 Feb 2026
CriticalCapability

Get alerts for Together AI

Never miss a breaking change. SignalBreak monitors Together AI and dozens of other AI providers in real time.