AI21 LabsCohereOpenSearch (AWS)xAIGoogle Gemini / Vertex AITogether AIMeta (Llama - hosted)ElasticReplicateHugging FaceMistral AIAWS Bedrock

Critical Deprecations Hit Cohere and AI21 Labs: Week of 15 September 2025

15 Sept 2025 – 22 Sept 20256 min read

Critical Deprecations Hit Cohere and AI21 Labs: Week of 15 September 2025

This week brought a brutal reminder that AI provider stability is still a myth. Cohere's decision to deprecate multiple Command models and completely retire fine-tuning capabilities represents one of the most significant breaking changes we've tracked this year. Combined with AI21 Labs' API parameter deprecation, developers face immediate migration pressure across multiple platforms.

The Big Moves

Cohere Pulls the Plug on Command Models and Fine-Tuning

Cohere delivered the week's biggest shock by deprecating several Command models and completely removing all fine-tuning capabilities, effective 16 September 2025. The deprecated models include command-r, command-r-plus, command-light, and the summarisation model, forcing users to migrate to newer alternatives like command-r-08-2024 or command-a-03-2025.

The complete removal of fine-tuning represents a fundamental shift in Cohere's strategy. Any applications relying on custom-trained models will cease to function, requiring not just a model swap but potentially a complete re-architecture of AI workflows. This isn't just about updating an endpoint; it's about losing the ability to customise models for specific use cases entirely.

For development teams, this creates an immediate crisis. The migration path to newer models may not preserve the same performance characteristics, particularly for applications that relied heavily on fine-tuned behaviour. Teams need to begin testing replacement models immediately, as the sunset date of 16 September leaves no room for delay. The broader implication is clear: Cohere is consolidating its model portfolio, but at the cost of breaking existing integrations.

AI21 Labs Breaks Maestro API Integration Patterns

AI21 Labs deprecated the tool_resources parameter in their Maestro runs API, effective 18 September 2025. This change forces developers to migrate to the new tools parameter structure, breaking existing API integration patterns that many teams have built their workflows around.

Whilst AI21 has provided migration documentation, the timing creates pressure for teams already managing other provider changes. The Maestro API is central to many enterprise implementations using AI21's platform, making this a high-impact change despite being technically straightforward. Failure to update API calls will result in immediate service disruptions, with no graceful degradation.

The change reflects AI21's broader platform evolution, but the communication timeline could have been better. Teams using the Maestro runs API need to prioritise this migration, particularly those with automated systems that might not fail gracefully when the parameter is no longer recognised.

xAI Launches Grok 4 Fast with Aggressive Pricing

xAI released Grok 4 Fast on 19 September, delivering a 40% token reduction in efficiency alongside reduced pricing and an expanded 2 million token context window. The model includes real-time web browsing capabilities, positioning it as a direct competitor to OpenAI's latest offerings.

The efficiency improvements are significant for high-volume applications, particularly those processing large documents or requiring extensive context retention. The pricing reduction, combined with the performance gains, makes Grok 4 Fast a compelling option for teams looking to reduce operational costs whilst maintaining capability.

For existing xAI users, the upgrade path appears straightforward, but teams should test thoroughly before switching production workloads. The real-time web browsing feature opens new use cases, particularly for applications requiring current information, though this capability will need careful evaluation for accuracy and reliability in production environments.

Worth Watching

Google Vertex AI Grounding API Changes Response Structure

Google modified the Vertex AI Grounding with Google Maps API on 18 September, removing several response fields and changing widget token behaviour. The widget_token_enable flag now controls the return of widget context tokens, requiring code updates for applications that previously relied on automatic token provision. Teams using this API for location-based AI applications need to update their response parsing logic to avoid service disruptions.

AWS OpenSearch Enables Cross-Account Pipeline Sharing

Amazon introduced cross-account ingestion capabilities for OpenSearch Ingestion pipelines on 19 September, allowing teams to share data processing workflows across AWS accounts. This addresses a significant limitation for organisations with complex account structures, enabling better collaboration between teams whilst maintaining security boundaries. The feature builds on existing VPC endpoint sharing capabilities, creating a more flexible data ingestion architecture.

Together AI Dramatically Increases Batch Processing Limits

Together AI enhanced their Batch Inference API with a massive rate limit increase to 30 billion tokens, alongside a revamped UI and broader model support. This represents a 10x improvement in processing capacity for large-scale AI workloads, making batch processing significantly more viable for enterprise applications. The cost reduction compared to real-time APIs makes this particularly attractive for training data generation and large-scale inference tasks.

AWS OpenSearch Serverless Adds Disk-Based Vector Search

Amazon launched disk-based vector search for OpenSearch Serverless on 18 September, reducing operational costs for vector workloads in memory-constrained environments. This capability makes similarity search more accessible for applications with large vector datasets, particularly beneficial for RAG implementations that don't require the highest performance but need cost efficiency.

Quick Hits

• Meta Llama 3.3 70B Instruct now available on AWS Bedrock with fine-tuning support (15 September) • Vertex AI Workbench M133 fixes Dataproc JupyterLab plugin compatibility issues (17 September) • Mistral AI released Magistral Medium 1.2 and Small 1.2 model updates (17 September) • Elasticsearch pushed out maintenance releases for versions 8.18.7, 8.19.4, 9.0.7, and 9.1.4 (16-18 September) • Replicate launched search API for models and collections with SDK support (16 September) • AI21 Labs added Data Connectors for enterprise integration with S3, Google Drive, and other platforms (16 September)

The Week Ahead

The immediate priority is managing the Cohere and AI21 Labs deprecations. Teams using affected Command models have until 16 September to complete migrations, whilst AI21 Maestro API users face an 18 September deadline. Both changes require code modifications and thorough testing.

Longer term, this week's pattern of aggressive deprecations suggests providers are accelerating their platform consolidation efforts. Teams should audit their dependencies across all AI providers and establish monitoring for deprecation announcements. The days of assuming API stability are clearly over.

Watch for potential follow-up announcements from other providers. When one major player makes breaking changes, competitors often use the opportunity to announce their own housekeeping efforts. October is traditionally a busy month for enterprise software changes, so expect more migration pressure in the coming weeks.