Google Gemini / Vertex AITogether AICohereGroqOpenSearch (AWS)AWS BedrockAnthropic

Google Vertex AI Security Boost and Fine-Tuning Wars: Week of 18 August 2025

18 Aug 2025 – 25 Aug 20255 min read

Google Vertex AI Security Boost and Fine-Tuning Wars: Week of 18 August 2025

Google's decision to add VPC, CMEK, and HIPAA support to Vertex AI Agent Engine signals a clear intent to capture regulated enterprise workloads, whilst the fine-tuning battle heats up with Together AI's aggressive move into OpenAI's open-source territory.

The Big Moves

Google Vertex AI Agent Engine Gets Enterprise Security Credentials

Google rolled out VPC, CMEK, and HIPAA support for Vertex AI Agent Engine on 21 August, marking a significant shift from experimental AI tooling to enterprise-ready infrastructure. This isn't just feature padding – it's a direct play for regulated industries that have been sitting on the sidelines of the AI agent revolution.

The timing is telling. With healthcare, financial services, and government agencies finally warming to AI deployment, Google's betting that security compliance will be the key differentiator. VPC support means organisations can keep their AI agents within their own network perimeters, whilst CMEK (Customer-Managed Encryption Keys) gives them control over encryption keys. HIPAA compliance opens the door to healthcare applications that were previously off-limits.

For organisations that have been running proof-of-concepts with Vertex AI Agent Engine, this removes the final barrier to production deployment. The migration path is straightforward – existing agents can be deployed within VPC environments without code changes, though you'll need to configure your network policies and encryption settings. Expect to see a surge in enterprise AI agent deployments over the coming months, particularly in sectors where data sovereignty has been a blocker.

Together AI Declares War on Fine-Tuning Friction

Together AI's announcement of fine-tuning support for OpenAI's gpt-oss models (both 20B and 120B variants) on 19 August represents a clever strategic move. Rather than competing directly with OpenAI's closed models, they're positioning themselves as the infrastructure provider for organisations wanting to customise OpenAI's open-source offerings.

This is particularly shrewd because fine-tuning distributed models at this scale is notoriously complex. Most organisations lack the infrastructure and expertise to manage training runs across multiple GPUs whilst maintaining consistent performance. Together AI is essentially saying: "You focus on your data and use cases, we'll handle the distributed training nightmare."

The service supports both Supervised Fine-Tuning (SFT) and Direct Preference Optimisation (DPO), giving organisations flexibility in how they shape model behaviour. For teams that have been struggling with the operational overhead of fine-tuning large models, this removes a significant barrier. The cost efficiency claims are worth monitoring – if Together AI can deliver on enterprise-grade reliability whilst undercutting in-house training costs, this could reshape how organisations approach model customisation.

Cohere Flexes with 111B Parameter Reasoning Model

Cohere's Command A Reasoning model, launched on 21 August with 111 billion parameters and a 256K context window, is their bid to stay relevant in the reasoning model arms race. The 23-language support is notable – whilst everyone else focuses on English-first reasoning, Cohere's betting on multilingual enterprise use cases.

The 256K context window puts it in direct competition with Claude and GPT-4 Turbo for document analysis and long-form reasoning tasks. For organisations dealing with complex multilingual documents or needing to maintain context across extended conversations, this could be a compelling alternative to the usual suspects. The agentic capabilities suggest Cohere is positioning this as more than just a chat model – they want it powering autonomous workflows.

Worth Watching

Groq's Automatic Prompt Caching Cuts Costs by Half

Groq's automatic prompt caching for the Kimi K2 model, effective 20 August, offers a 50% reduction in token costs with no code changes required. The automation is the key here – no need to manage cache keys or implement complex caching logic. For high-volume applications with repetitive prompts, this could significantly impact operating costs. Worth testing if you're running Kimi K2 workloads.

OpenSearch Continues Stability Push

AWS OpenSearch released multiple bug fix updates this week, including version 2.19.3 on 21 August. The fixes address task cancellation in aggregators, profiler timing issues, and regex query handling improvements. Whilst these are maintenance releases, the focus on query performance and system stability suggests OpenSearch is preparing for increased enterprise adoption. No action required, but expect improved reliability in production deployments.

AWS Bedrock Adds Token Count Estimation

Bedrock's new CountTokens API, available from 21 August for Claude 3.5 Haiku and Sonnet, lets developers estimate token usage before sending requests. This proactive cost management tool is particularly valuable for applications with variable prompt lengths. Useful for budget planning and avoiding rate limit surprises, though it's more of a nice-to-have than a must-have feature.

Together AI Shares Engineering Automation Playbook

Together AI's case study on automating engineering workflows with AI agents, published 21 August, provides practical insights into building reliable autonomous systems. The emphasis on verifiable tasks, stable environments, and robust monitoring reflects hard-learned lessons from production deployments. Worth reading if you're planning agent-based automation projects.

Quick Hits

Anthropic launched higher education advisory board and AI Fluency courses, plus new admin controls for business plans
Anthropic announced nuclear safeguards development through public-private partnership
OpenSearch fixed null pointer exceptions and authentication issues in version 2.19.2

The Week Ahead

Watch for enterprise adoption metrics from Google following the Vertex AI Agent Engine security updates. The real test will be whether regulated industries start moving from pilot projects to production deployments. Together AI's fine-tuning service will likely see its first customer case studies emerge, providing early indicators of demand for managed fine-tuning services.

Cohere's Command A Reasoning model will need to prove itself against established players in reasoning benchmarks. Early performance comparisons should surface by month-end, particularly for multilingual reasoning tasks where Cohere claims an advantage.

The broader trend towards enterprise-ready AI infrastructure continues, with security, compliance, and operational efficiency becoming key differentiators. Expect more providers to announce similar enterprise-focused capabilities as the market matures beyond experimental use cases.