InfoCapability

Holotron-12B High-Throughput Multimodal Agent with Nemotron SSM delivers 8.9k tokens/s at 100 concurrency

AI Impact Summary

Holotron-12B is a production-oriented multimodal agent model built on Nemotron-Nano-2 VL, optimized for long-context, interactive workloads. Its hybrid state-space model with attention reduces memory footprint and scales throughput, demonstrated on a single H100 with vLLM, achieving 8.9k tokens/s at 100 concurrent requests. The model is released on Hugging Face under the NVIDIA Open Model License and targets enterprise data generation, annotation, and online reinforcement learning pipelines that require high-throughput agentic reasoning.

Affected Systems

Holotron-12BNemotron-Nano-2 VL

Date: Date not specified
Change type: capability
Severity: info

Holotron-12B High-Throughput Multimodal Agent with Nemotron SSM delivers 8.9k tokens/s at 100 concurrency

More from Hugging Face

Get alerts for Hugging Face