Together AIGoogle Gemini / Vertex AIQdrantMistral AIxAICohereAWS BedrockElasticReplicate

Together AI Forces Mass Model Migration as 16 Popular Models Face Immediate Deprecation

25 Aug 2025 – 1 Sept 20255 min read

Together AI Forces Mass Model Migration as 16 Popular Models Face Immediate Deprecation

Together AI has pulled the rug from under thousands of developers this week, deprecating 16 popular models including Llama, FLUX, and Qwen with immediate effect from 27 August. This isn't a gentle sunset with months of notice—it's an abrupt halt that's forcing emergency migrations across the AI ecosystem.

The Big Moves

Together AI's Brutal Model Cull Forces Emergency Migrations

Together AI's decision to deprecate 16 models simultaneously represents one of the most disruptive provider changes we've seen this year. The affected models include some of the most widely deployed options: Llama variants, FLUX image generation models, and Qwen language models. Applications relying on these models ceased functioning on 27 August, with no grace period offered.

The timing coincides suspiciously with the release of DeepSeek-V3.1, Together AI's new hybrid thinking model. While DeepSeek-V3.1 offers impressive capabilities—combining both rapid response and deep reasoning modes within a single model—it's not a drop-in replacement for the deprecated options. Teams using FLUX for image generation, for instance, can't simply swap in a language model and expect their workflows to continue.

The migration path isn't straightforward either. DeepSeek-V3.1's hybrid architecture means developers need to restructure their applications to properly utilise both thinking and non-thinking modes. The model's 99.9% SLA and serverless deployment are attractive, but the forced migration timeline leaves little room for proper testing and optimisation. Expect significant disruption in the short term, though early adopters of DeepSeek-V3.1 are reporting impressive results for complex reasoning tasks like code generation and debugging.

Google Expands Vertex AI with Gemini 2.5 Flash Image

Google has quietly released Gemini 2.5 Flash Image in preview, marking a significant expansion of Vertex AI's multimodal capabilities. The new model supports multi-reference image inputs and improved multi-turn editing, addressing key limitations in previous image generation offerings. More importantly, Vertex AI model tuning now integrates directly with the Gen AI evaluation service, enabling automated model assessment workflows.

The Vertex AI Workbench M132 release brings additional infrastructure improvements, including a new scheduler plugin supporting both Cloud Composer and Vertex AI notebook schedulers. However, buried in the release notes is a deprecation notice for older image generation endpoints. While no immediate action is required, teams should plan their migration strategy now rather than scrambling when the deprecation date arrives.

Google's approach here contrasts sharply with Together AI's abrupt changes. The preview status gives developers time to experiment and integrate the new capabilities without breaking existing workflows. The evaluation service integration is particularly noteworthy—it suggests Google is building towards more sophisticated automated model management, which could become a significant competitive advantage.

Qdrant Addresses Critical Data Integrity Issues

Qdrant's v1.15.4 release tackles several critical data integrity problems that have been plaguing production deployments. The fixes address full text indexing corruption and point reuse issues that could lead to data loss or inconsistent query results. For teams running production vector databases, these aren't minor improvements—they're essential stability fixes.

The Docker optimisations, including reduced image sizes and image signing, improve both operational efficiency and security posture. The team has also enhanced metrics monitoring and adjusted shard key formats, indicating ongoing efforts to scale the platform. While these changes don't require immediate action, the data integrity fixes make this update essential for any production Qdrant deployment.

Worth Watching

Mistral AI Hardens Security with New Completion API Parameter

Mistral AI has introduced a security parameter to prevent token-length side-channel attacks in their Completion API. While SDK users remain unaffected, applications using the mistral-common package must update to version 1.8.4 or higher to avoid parsing failures. It's a relatively small change, but one that could break strict chunk parsing implementations.

xAI Enters the Coding Arena with Grok Code Fast 1

xAI's new grok-code-fast-1 model targets the increasingly competitive coding AI space. The model supports TypeScript, Python, and Java, with planned updates for multimodal inputs and longer context windows. The limited-time free access provides a valuable testing opportunity, though the long-term pricing strategy remains unclear.

Cohere Expands Translation Capabilities

Cohere's Command A Translate model adds support for 23 languages, expanding their multilingual offerings. It's a straightforward capability addition that doesn't disrupt existing workflows but provides new options for international applications.

AWS Bedrock Adds OpenAI Batch Processing

AWS Bedrock now supports the OpenAI batch API, enabling bulk processing workflows. This enhancement provides more flexibility for large-scale operations without requiring changes to existing implementations.

Quick Hits

• Elasticsearch: Four maintenance releases (8.18.6, 8.19.3, 9.0.6, 9.1.3) focus on stability and bug fixes • Replicate: Platform UX improvements include Artificial Analysis arena rankings and enhanced organisation signup

The Week Ahead

The Together AI migration crisis will dominate the coming week as teams scramble to restore functionality. Expect significant discussion around provider reliability and the need for multi-provider strategies. Google's Vertex AI updates should see increased adoption as teams look for more stable alternatives.

Watch for potential pricing announcements from xAI regarding Grok Code Fast 1's post-trial costs, and monitor Qdrant deployment metrics to assess the impact of the data integrity fixes. The Mistral AI security update may reveal additional applications affected by the mistral-common dependency.

Most critically, teams affected by the Together AI deprecations need to complete their migrations immediately. Those still evaluating alternatives should prioritise providers with clearer deprecation policies and longer notice periods. The AI provider landscape is maturing rapidly, but this week's events highlight the ongoing risks of depending on any single provider for critical infrastructure.