Google Forces Vertex AI Migration as Explainable AI Shuts Down
Google Forces Vertex AI Migration as Explainable AI Shuts Down
Google has pulled the plug on Vertex Explainable AI with immediate effect from 16 March 2026, leaving teams scrambling for alternative model interpretability solutions. This forced migration represents the most disruptive change in the AI provider landscape this week, affecting any organisation relying on Google's explainability service for bias detection and model transparency.
What's changing with Vertex AI endpoints?
Vertex Explainable AI's sudden deprecation effective 16 March 2026 has caught many teams off guard. Unlike typical Google sunset announcements that provide generous migration windows, this service termination offers virtually no transition period. The timing suggests Google is prioritising resources elsewhere within Vertex AI, possibly towards their newer Gemini-powered explainability features.
For organisations currently using Vertex Explainable AI for model interpretability, the impact is immediate and severe. Teams must rapidly assess alternative solutions including open-source frameworks like SHAP or LIME, third-party explainability platforms, or Google's own newer interpretability tools within the broader Vertex AI suite. The lack of a clear migration path from Google compounds the challenge, forcing technical teams to evaluate multiple options simultaneously whilst maintaining production systems.
The deprecation coincides with broader updates to Vertex AI Workbench and Vector Search, indicating Google is reshaping its ML platform architecture. However, the abrupt nature of this particular sunset suggests internal strategic shifts rather than planned evolution. Teams should audit their current explainability dependencies immediately and begin migration planning to avoid service disruption.
AWS expands real-time AI capabilities across multiple services
Amazon is making significant moves in real-time AI interactions this week, with Bedrock AgentCore Runtime gaining WebRTC support for bidirectional streaming on 20 March 2026. This capability transforms how AI agents can interact with users, enabling low-latency, real-time communication patterns that were previously impossible through traditional request-response APIs.
The WebRTC integration allows AgentCore Runtime to support use cases requiring immediate feedback loops, such as collaborative AI assistants, real-time data analysis agents, or interactive tutoring systems. This positions AWS ahead of competitors in the emerging market for conversational AI agents that need to maintain continuous dialogue rather than discrete interactions.
Simultaneously, Amazon Polly is expanding with 10 new generative TTS voices and its own Bidirectional Streaming API on 20 March 2026. The timing isn't coincidental - AWS is building a comprehensive real-time AI communication stack. Combined with the NIXL support and EFA integration for accelerated LLM inference launched 19 March 2026, AWS is clearly positioning itself as the platform for high-performance, interactive AI applications.
OpenAI accelerates embedding model development
OpenAI's new domain-specific embedding model creation process, announced 20 March 2026, represents a significant shift in how organisations can build retrieval-augmented generation (RAG) systems. The synthetic data generation pipeline using LLMs to create question-answer pairs eliminates the traditional bottleneck of manual data labelling whilst reducing bias introduction.
This development particularly benefits enterprises struggling with poor RAG performance due to generic embeddings that don't understand domain-specific terminology or concepts. Previously, creating custom embeddings required substantial ML expertise and weeks of data preparation. OpenAI's approach reduces this to under a day, democratising access to high-quality, domain-specific retrieval systems.
The broader implication is that RAG system quality should improve dramatically across industries. Legal firms, medical organisations, and technical companies can now rapidly deploy retrieval systems that understand their specific jargon and concepts without extensive ML team involvement. This could accelerate enterprise AI adoption significantly.
Worth Watching
Anthropic enhances model discovery: The Models API now returns capability fields including max_input_tokens, max_tokens, and capabilities objects effective 18 March 2026. This programmatic model discovery allows applications to intelligently select Claude variants based on requirements rather than hardcoding model choices. The change suggests Anthropic expects rapid model proliferation requiring dynamic selection logic.
Together AI expands fine-tuning scope: Tool calling, reasoning, and vision support arrive for Together AI's fine-tuning service on 18 March 2026, alongside 6x throughput improvements and support for 100B+ parameter models. The addition of cost and ETA estimation makes fine-tuning more predictable for production planning. This positions Together AI as a serious alternative to OpenAI's fine-tuning offerings.
NVIDIA models proliferate across platforms: Nemotron 3 Super becomes available on Amazon Bedrock 18 March 2026, whilst LM Studio adds support for NVIDIA DGX Station GB300 for local model deployment. The multi-platform availability suggests NVIDIA is prioritising broad distribution over exclusive partnerships.
China dominates open-source AI contributions: Hugging Face reports Chinese organisations now lead in model downloads and releases, surpassing US contributions in 2025. This shift reflects strategic moves towards AI sovereignty and data localisation, potentially impacting global model selection strategies.
Quick Hits
• Elasticsearch maintenance: Versions 9.3.2, 9.2.7, and 8.19.13 released with bug fixes and security updates • Amazon Connect expansion: Voice AI agents support 13 new languages with agentic speech-to-speech in London region • Bedrock model additions: Minimax M2.5 and GLM 5 models now available • Claude thinking optimisation: New 'omitted' display field improves streaming performance by hiding internal reasoning • OpenSearch deprecation: Version 3.5 reaches end-of-life 17 March 2026
The Week Ahead
The immediate priority for any team using Vertex Explainable AI is migration planning. With the 16 March 2026 effective date already passed, alternative explainability solutions must be implemented urgently. Google's image and video generation endpoints face deprecation on 30 June 2026, providing slightly more breathing room but requiring attention.
AWS's real-time AI capabilities warrant evaluation for teams building interactive agents or voice applications. The WebRTC support in AgentCore Runtime and Polly's bidirectional streaming represent genuine capability advances rather than incremental improvements.
Watch for OpenAI's acquisition of Astral to complete, potentially accelerating Codex development for Python tooling. The strategic focus on developer tools suggests OpenAI is building a comprehensive development ecosystem beyond language models.