Together AI
inference_host
155 signals tracked
Collinear Simulations and Together Evals: Dynamic AI Agent Testing
Test AI agents in the real world with Collinear TraitMix and Together Evals: dynamic persona simulations, multi-turn dialogs, and LLM-as-judge scoring.
Date not specified
HighCapabilityEvaluating and Benchmarking Large Language Models (LLMs)
Understanding how to evaluate and benchmark Large Language Models (LLMS). Test, compare, and understand LLMs.
Date not specified
MediumCapabilityTogether AI Launches Fastest Voice AI Stack with Sub-Second Latency
Together AI launches the fastest voice AI stack: streaming Whisper STT, serverless open-source TTS (Orpheus & Kokoro), and Voxtral transcription. Sub-second latency for production voice agents.
Date not specified
HighCapabilityTogether AI adds FLUX.2 multi-reference image generation
Production-grade image generation with multi-reference consistency, exact brand colors, and reliable text rendering. FLUX.2 from Black Forest Labs, now on Together AI's platform.
Date not specified
HighCapabilityTogether AI delivers fastest inference for top open-source models
Together AI achieves up to 2x faster inference for top open-source models like Qwen, DeepSeek, and Kimi through GPU optimization, advanced speculative decoding, and FP4 quantization—ranking #1 in speed benchmarks on NVIDIA Blackwell architecture.
Date not specified
HighCapabilityOpenAI AutoJudge: Automated Inference Acceleration via Mismatch Detection
AutoJudge accelerates LLM inference by identifying which token mismatches actually matter. Using self-supervised learning to train a lightweight classifier, it accepts up to 40 draft tokens per cycle—delivering 1.5–2× speedups over standard speculative decoding with minimal accur
Date not specified
HighCapabilityTogether AI Native Cloud: Run TorchForge RL Pipelines
Date not specified
HighCapabilityTogether AI integrates PyTorch RL into AI Native Cloud
Build, train, and deploy advanced AI agents with integrated reinforcement learning on the Together platform.
Date not specified
HighCapabilityTogether releases Python SDK v2.0 — major architectural update
Date not specified
CriticalCapabilityNVIDIA Nemotron 3 Nano now available on Together AI
Nemotron 3 Nano, NVIDIA’s newest reasoning model, is now available on Together AI, the AI Native Cloud
Date not specified
MediumCapabilityVP Dan Fu: AGI is Possible – Optimizing Existing Hardware
Dan Fu, our VP of Kernels, has published a new post challenging the idea that AI is hitting a hardware wall. He argues that we are vastly underutilizing current chips and that better software-hardware co-design will unlock the next order of magnitude in performance.
Date not specified
InfoCapabilityTogether AI releases Rime Arcana v2 and Mist v2 TTS models
Two enterprise-grade Rime TTS models now available on Together AI. Co-locate with LLM and STT on dedicated infrastructure. Proven at billions of calls.
Date not specified
HighCapabilityTogether AI launches MiniMax Speech 2.6 Turbo for native multilingual TTS
MiniMax Speech 2.6 Turbo: State-of-the-art multilingual TTS with human-level emotional awareness, sub-250ms latency, and 40+ languages—now on Together AI.
Date not specified
HighCapabilityOpenAI: How to Choose the Right Open Model for Production
Learn how to choose the right open-source model for production by evaluating model quality, benchmarking performance, and deploying open models that balance cost, speed, and accuracy.
Date not specified
HighCapabilityScaling Model Training with Multi-Node GPU Clusters
Learn how foundation models are trained at scale using multi-node GPU clusters, including distributed training techniques, infrastructure requirements, and practical steps to scale training efficiently.
Date not specified
HighCapabilityCursor and Together AI partner on NVIDIA Blackwell for real-time AI inference
Together AI teamed with Cursor to build the real-time inference stack that keeps in-editor agents fast and reliable. They productionized NVIDIA Blackwell (B200/GB200), tuning ARM hosts, kernels, and FP4/TensorRT quantization for low latency and rapid model rollouts.
Date not specified
HighCapabilityOptimizing Inference Speed and Costs: Together AI's Lessons
Learn how to reduce inference latency without massive cost using proven inference optimization tactics — improving throughput, GPU utilization, and cost efficiency while balancing throughput vs. latency tradeoffs.
Date not specified
HighCapabilityOpenAI DSGym: New Framework for Evaluating Data Science Agents
Introducing DSGym—a holisti evaluation and training framework for LLM-based data science agents. Features 90+ bioinformatics tasks, 92 Kaggle competitions, and synthetic trajectory generation. Our 4B model achieves state-of-the-art performance among open-source models through exe
Date not specified
HighCapabilityTogether Evaluations now supports OpenAI, Anthropic, and Google model benchmarking
Together Evaluations now supports OpenAI, Anthropic, and Google models for cross-provider benchmarking. Compare open-source, fine-tuned, and proprietary models side-by-side to make data-driven decisions on quality, cost, and performance—all in one platform.
Date not specified
HighCapabilityFine-tuned Open LLM Judges Outperform GPT-5.2 for Evaluation
Fine-tuned open-source LLM judges can outperform GPT-5.2 at evaluating model outputs. Using Direct Preference Optimization on just 5,400 preference pairs, we trained GPT-OSS 120B to beat GPT-5.2 on human preference alignment—at 15x lower cost and 14x faster inference speeds.
Date not specified
HighCapability
Get alerts for Together AI
Never miss a breaking change. SignalBreak monitors Together AI and dozens of other AI providers in real time.
Sign up free — no credit card required