Google Gemini / Vertex AIReplicateCohereAWS BedrockHugging Face

Google Expands Vertex AI Accelerators as GPU Competition Heats Up

12 May 2025 – 19 May 20255 min read

Google Expands Vertex AI Accelerators as GPU Competition Heats Up

Google has quietly expanded Vertex AI's hardware arsenal with new A3 Ultra, A4, and A3 Mega accelerator support, marking the latest move in an increasingly competitive week for AI infrastructure. Whilst this might seem like routine capacity expansion, it signals Google's determination to keep pace with rivals who are rapidly democratising access to high-end compute.

The Big Moves

Google's Vertex AI Accelerator Expansion: Playing Catch-Up or Strategic Positioning?

Google's addition of A3 Ultra, A4, and A3 Mega accelerator support to Vertex AI on 15 May represents more than just hardware diversification. This expansion directly addresses the growing demand for specialised compute options as enterprises move beyond proof-of-concept deployments into production-scale AI workloads.

The timing is particularly telling. Whilst Google has historically focused on its custom TPUs, this broader accelerator support acknowledges that different workloads require different hardware optimisations. The A3 Ultra targets memory-intensive tasks, whilst the A4 series offers balanced performance for general ML workloads. For existing Vertex AI users, this is purely additive with no migration requirements, but the strategic implications run deeper.

This move positions Google to compete more effectively with AWS's diverse instance types and Microsoft's Azure ML compute options. More importantly, it gives enterprise customers the flexibility to optimise costs across different workload types without vendor lock-in concerns. The lack of migration requirements means immediate adoption is frictionless, likely driving rapid uptake among cost-conscious enterprises.

Replicate's H100 Gambit: Democratising Premium Compute

Replicate's introduction of NVIDIA H100 GPUs and multi-GPU configurations for A100 and L40S systems on 16 May represents a more aggressive play in the AI infrastructure space. The H100 addition is significant because these GPUs have been notoriously difficult to access, with major cloud providers rationing availability.

By offering H100 access through their platform, Replicate is positioning itself as the go-to solution for developers who need cutting-edge performance without the enterprise commitments required by traditional cloud providers. The multi-GPU configurations for A100 and L40S systems address the scaling challenges that have plagued custom model development, particularly for fine-tuning large language models.

This capability expansion directly targets the growing market of AI startups and research teams who need sporadic access to premium hardware. Rather than committing to long-term contracts with major cloud providers, these users can now access H100 performance on-demand. The competitive pressure this creates for traditional cloud providers cannot be understated, particularly as it comes with Replicate's model-centric workflow that abstracts away much of the infrastructure complexity.

Oracle Enters the Model Marketplace Race

Oracle's integration of Cohere models (Command A, Embed v3.0, and Rerank 3.5) into OCI Generative AI on 14 May signals the company's serious intent to compete in the enterprise AI space. This isn't just about adding another model provider; it's about Oracle leveraging its existing enterprise relationships to capture AI workloads.

The choice of Cohere models is strategic. Command A's 150% throughput improvement and Embed v3.0's multimodal capabilities directly address enterprise use cases around search, document processing, and customer service automation. Oracle's enterprise customer base, particularly in financial services and healthcare, provides a natural market for these capabilities.

What makes this particularly interesting is Oracle's positioning against the hyperscale cloud providers. By offering curated, enterprise-focused AI capabilities rather than the overwhelming choice of platforms like AWS Bedrock, Oracle is betting that enterprises prefer simplified, purpose-built solutions. This approach could resonate with organisations that want AI capabilities without the complexity of managing multiple model providers and APIs.

Worth Watching

Replicate's User Experience Improvements

Replicate's addition of playground integration and audio preview support on 16 May, alongside the 'More by this user' button introduced on 12 May, reflects a platform maturing beyond basic model hosting. These seemingly minor improvements signal Replicate's evolution from a simple model repository to a comprehensive development environment. The playground integration particularly matters as it reduces friction in the model experimentation phase, potentially accelerating adoption among developers who value rapid iteration.

AWS Bedrock's Video Blueprint Support

AWS Bedrock's introduction of video blueprint support within BDA on 16 May expands the platform's multimedia processing capabilities. Whilst details remain sparse, this addition suggests AWS is building towards more comprehensive content understanding capabilities. For enterprises dealing with video content at scale, this could represent a significant workflow simplification, eliminating the need for separate video processing pipelines.

Hugging Face's Continued Innovation Pace

Hugging Face maintained its typical pace of innovation this week with announcements around Transformers Library standardisation, Falcon-Edge 1.58bit models, improved Kaggle integration, faster Whisper transcriptions, and enhanced Vision Language Models. Whilst individually these might seem incremental, collectively they demonstrate Hugging Face's commitment to remaining the de facto standard for open-source AI development. The Falcon-Edge models particularly deserve attention as 1.58bit quantisation could significantly reduce deployment costs for edge applications.

Quick Hits

Hugging Face improved model access for Kaggle users, streamlining the research-to-competition pipeline
Blazingly fast Whisper transcriptions now available through Inference Endpoints
Vision Language Models received performance and capability improvements across the Hugging Face ecosystem

The Week Ahead

Watch for potential responses from AWS and Microsoft to Google's Vertex AI accelerator expansion. The competitive dynamics suggest we might see announcements around new instance types or pricing adjustments. Oracle's Cohere integration will likely prompt other enterprise-focused cloud providers to accelerate their own model marketplace strategies.

Replicate's H100 availability could trigger a broader democratisation of premium compute access, potentially forcing traditional cloud providers to reconsider their allocation strategies. Keep an eye on pricing announcements and availability commitments from the major players.

The enterprise AI market is clearly heating up, with each provider taking different approaches to capture market share. The next few weeks will likely reveal which strategies resonate most with enterprise buyers as Q2 budget cycles conclude.