Together AI
inference_host
255 signals tracked
Parcae: Stable Looped Language Model Achieves Transformer-Level Performance
Parcae is a stable looped language model that matches the quality of a Transformer twice its size — a 770M model reaching 1.3B-level performance. We introduce the first scaling laws for looping and show that increasing recurrence, not just data, is a compute-efficient path to bet
15 Apr 2026
HighCapabilityParcae: Stable Looped Language Model Achieves Transformer-Level Performance
Parcae is a stable looped language model that matches the quality of a Transformer twice its size — a 770M model reaching 1.3B-level performance. We introduce the first scaling laws for looping and show that increasing recurrence, not just data, is a compute-efficient path to bet
15 Apr 2026
HighCapabilityOpenAI launches EinsteinArena: AI agents collaborate on mathematical problems
EinsteinArena is a platform where AI agents collaborate and compete on open math problems. AI agents on EinsteinArena have already set 11 new state-of-the-art results on open math problems — including pushing the kissing number lower bound in dimension 11 from 593 to 604.
13 Apr 2026
HighCapabilityOpenAI launches EinsteinArena: AI agents collaborate on math problems
EinsteinArena is a platform where AI agents collaborate and compete on open math problems. AI agents on EinsteinArena have already set 11 new state-of-the-art results on open math problems — including pushing the kissing number lower bound in dimension 11 from 593 to 604.
13 Apr 2026
HighCapabilityTogether AI introduces AI Native Cloud
AI-native companies need infrastructure built for models, not legacy workloads. Learn what defines an AI Native Cloud and why it matters for the next platform shift.
7 Apr 2026
HighCapabilityTogether AI introduces the AI Native Cloud
AI-native companies need infrastructure built for models, not legacy workloads. Learn what defines an AI Native Cloud and why it matters for the next platform shift.
7 Apr 2026
HighCapabilityOpenAI: Using LLMs to Optimize Database Query Execution
New research shows LLMs can optimize database query execution plans—achieving up to 4.78x speedups by correcting the cardinality estimation errors that statistical heuristics miss.
3 Apr 2026
HighCapabilityTogether AI releases Wan 2.7 video suite
A four-model video suite for generation, continuation, reference-driven workflows, and editing, rolling out on Together AI starting with text-to-video.
3 Apr 2026
HighCapabilityTogether AI launches Wan 2.7 video suite
A four-model video suite for generation, continuation, reference-driven workflows, and editing, rolling out on Together AI starting with text-to-video.
3 Apr 2026
MediumCapabilityOpenAI: LLMs Optimize Database Query Execution Plans
New research shows LLMs can optimize database query execution plans—achieving up to 4.78x speedups by correcting the cardinality estimation errors that statistical heuristics miss.
3 Apr 2026
HighCapabilityDeepgram speech-to-text and voice models now available natively on Together AI
Production STT and TTS from Deepgram, available on Together AI Dedicated Model Inference for real-time voice agents.
2 Apr 2026
HighCapabilityOpenAI Releases Aurora: RL Framework for Adaptive Speculative Decoding
1.25x over a well-trained static speculator. Aurora is an open-source RL framework that turns speculative decoding from a one-time offline setup into a self-improving system that learns from every request it serves.
31 Mar 2026
HighCapabilityOpenAI Releases Aurora: RL Framework for Continuous Speculative Decoding
1.25x over a well-trained static speculator. Aurora is an open-source RL framework that turns speculative decoding from a one-time offline setup into a self-improving system that learns from every request it serves.
31 Mar 2026
HighCapabilityNew Research: Smaller LLMs Outperform GPT-4o on Long Context Tasks
As context windows grow, LLM performance degrades in unexpected ways. We show how a "Divide & Conquer" framework — breaking long documents into parallel chunks with a planner, workers, and manager — lets smaller models like Llama-3-70B and Qwen-72B outperform GPT-4o single-shot.
26 Mar 2026
HighCapabilitySmaller LLMs outperform GPT-4o on long context tasks with 'Divide & Conquer'
As context windows grow, LLM performance degrades in unexpected ways. We show how a "Divide & Conquer" framework — breaking long documents into parallel chunks with a planner, workers, and manager — lets smaller models like Llama-3-70B and Qwen-72B outperform GPT-4o single-shot.
26 Mar 2026
HighCapabilityTogether AI expands fine-tuning with tool calling, reasoning, and vision support
Together AI expands fine-tuning with native support for tool call, reasoning, and vision-language models, plus 100B+ model training, up to 6× higher throughput, and job cost and ETA estimates.
18 Mar 2026
HighCapabilityTogether AI expands fine-tuning with tool calling, reasoning, and vision support
Together AI expands fine-tuning with native support for tool call, reasoning, and vision-language models, plus 100B+ model training, up to 6× higher throughput, and job cost and ETA estimates.
18 Mar 2026
HighCapabilityOpenAI releases Mamba-3: Faster inference with improved SSM
Meet Mamba-3: the SSM built for inference. Faster than Transformers at decode, stronger than Mamba-2, and open-source from day one.
17 Mar 2026
HighCapabilityOpenAI releases Mamba-3: Faster SSM for Inference
Meet Mamba-3: the SSM built for inference. Faster than Transformers at decode, stronger than Mamba-2, and open-source from day one.
17 Mar 2026
HighCapabilityTogether AI announces new AI innovations at NVIDIA GTC 2026
Together AI arrives at NVIDIA GTC 2026 with new launches in inference, agents, voice AI, and open models — plus technical sessions from its research and engineering leaders.
16 Mar 2026
InfoCapability
Get alerts for Together AI
Never miss a breaking change. SignalBreak monitors Together AI and dozens of other AI providers in real time.
Sign up free — no credit card required