Google Gemini / Vertex AICohereReplicateAWS BedrockTogether AI

Google Vertex AI Expands Model Arsenal as Cohere Unleashes 111B Parameter Powerhouse

10 Mar 2025 – 17 Mar 20255 min read

AI Provider Intelligence: Week of 10 March 2025

Google went on a model release spree this week, flooding Vertex AI with everything from Gemma 3 to video generation tools, whilst Cohere quietly dropped what might be the most compelling enterprise LLM of 2025. Meanwhile, AWS continued its steady march towards AI ubiquity with cross-region inference and computer automation capabilities.

The Big Moves

Cohere's Command A Targets Enterprise Efficiency

Cohere's release of Command A on 13 March represents a significant leap in enterprise AI capabilities. The 111B parameter model delivers substantially enhanced throughput alongside improved performance in tool use, retrieval-augmented generation, and multilingual tasks. What makes this particularly noteworthy isn't just the parameter count, but the focus on inference efficiency that directly addresses enterprise cost concerns.

The timing is strategic. As organisations grapple with AI operational costs, Command A's improved efficiency could provide a compelling alternative to larger, more resource-intensive models. Early indicators suggest significant performance improvements in complex reasoning tasks whilst maintaining faster processing speeds. For enterprises already invested in Cohere's ecosystem, this represents a clear upgrade path that could reduce operational overhead whilst expanding capabilities.

The model's enhanced multilingual capabilities also position it well for global enterprises requiring consistent performance across languages. This isn't just another model release; it's a direct challenge to the assumption that bigger always means better in enterprise AI.

Google's Vertex AI Model Expansion Accelerates

Google's 12 March announcement represents one of the most comprehensive model expansions we've seen on Vertex AI. The addition of Gemma 3, ShieldGemma 2, and CogVideoX-2b, combined with enhanced fine-tuning tools for Llama 3.1 and Gemma 2, signals Google's intent to become the Swiss Army knife of AI platforms.

The inclusion of CogVideoX-2b is particularly significant, bringing video generation capabilities directly into Vertex AI's ecosystem. This moves Google beyond text and image generation into the increasingly important video content space. For developers already embedded in the Google Cloud ecosystem, this eliminates the need to integrate separate video generation services.

The fine-tuning tool updates deserve attention beyond the headline. Enhanced customisation capabilities for both Llama 3.1 and Gemma 2 suggest Google is serious about supporting open-source models alongside its proprietary offerings. This hybrid approach could prove crucial as organisations seek to avoid vendor lock-in whilst maintaining access to cutting-edge capabilities.

Gemini 2.0 Flash Fine-Tuning Reaches Production

Google's 11 March announcement of Gemini 2.0 Flash fine-tuning reaching general availability marks a crucial milestone for developers seeking customised AI solutions. The inclusion of function calling support transforms this from a simple model update into a platform capability that could reshape how developers integrate AI into their applications.

Function calling support is the key differentiator here. It enables Gemini 2.0 to interact directly with external systems and APIs, moving beyond simple text generation to become a genuine automation platform. For enterprises looking to integrate AI into existing workflows, this capability could prove transformative.

The GA status also means this is now production-ready with full support guarantees. Early adopters who've been testing the preview can now confidently deploy these capabilities in production environments. The timing suggests Google is preparing for increased competition in the customisable AI space.

Worth Watching

AWS Bedrock Gains Cross-Region Intelligence

AWS Bedrock's 13 March update introducing cross-region inference and structured data retrieval addresses two critical enterprise concerns: latency and data complexity. The ability to deploy knowledge bases closer to data sources could significantly improve response times for global organisations. The structured data retrieval capability, which generates SQL from natural language queries, represents a meaningful step towards making complex databases accessible through conversational interfaces.

Context Caching Goes Mainstream on Vertex AI

Google's 13 March announcement of general availability for Gemini context caching on Vertex AI tackles one of the most persistent cost and performance challenges in AI applications. For applications requiring repeated context processing, this could deliver substantial cost savings and performance improvements. The GA status means enterprises can now rely on this capability for production workloads without preview limitations.

Bedrock Agents Gain Computer Control

Amazon's 10 March introduction of computer use tools for Bedrock Agents represents a significant capability expansion, but one that requires careful consideration. The ability for AI agents to perform screen captures and file editing opens new automation possibilities whilst introducing substantial security risks. Organisations implementing this capability must establish robust isolation and access controls to prevent potential system compromises.

Replicate Simplifies Team Collaboration

Replicate's 15 March removal of the GitHub organisation requirement for team creation eliminates a significant friction point for collaborative AI development. This change makes it easier for teams to share resources and collaborate on AI projects without requiring existing GitHub infrastructure. The timing suggests Replicate is positioning itself as a more accessible alternative to more complex enterprise platforms.

Quick Hits

AWS Bedrock expanded to Europe (Milan) and Europe (Spain) regions on 14 March, improving latency for European users. Together AI optimised ThunderKittens for NVIDIA Blackwell GPUs and became an NVIDIA Cloud Partner. Replicate rolled out UI improvements including better error messages and visual indicators for multi-output predictions.

The Week Ahead

Watch for potential follow-up announcements from Google regarding additional Vertex AI model integrations, particularly as GTC approaches. Cohere's Command A performance benchmarks should begin appearing from early adopters, providing real-world validation of the efficiency claims. AWS is likely to announce additional Bedrock regional expansions as the service continues its global rollout.

The computer use capabilities in Bedrock Agents warrant close monitoring for security advisories and best practice guidance. Early implementations will likely reveal both the potential and the risks of this powerful new capability.