NVIDIA Llama Nemotron Nano VLM released on Hugging Face Hub
AI Impact Summary
NVIDIA has released the Llama Nemotron Nano VLM to the Hugging Face Hub, a state-of-the-art 8B Vision Language Model (VLM) designed for intelligent document processing. This model leverages a Vision Transformer (ViT) architecture, C-RADIOv2-VLM-H, combined with a Multi-Layer Perceptron (MLP) connector and a diverse training dataset including synthetic and curated data, to excel in tasks like OCR, table extraction, and document understanding. This release provides access to a powerful tool for automating workflows involving complex documents.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info