InfoCapability

Docmatix: 2.4M Image DocVQA Dataset Released

AI Impact Summary

The release of Docmatix represents a significant expansion of the available data for Document Visual Question Answering (DocVQA), scaling the dataset to 2.4 million images and 9.5 million Q/A pairs derived from 1.3 million PDFs. This substantial increase in scale, a 240x improvement over previous datasets, allows for more robust training of Vision-Language Models (VLMs) and potentially closes the performance gap with proprietary models like Idefics2. Fine-tuning Florence-2 on Docmatix yielded a 20% performance increase on the DocVQA benchmark.

Affected Systems

Florence-2Phi-3-small

Date: Date not specified
Change type: capability
Severity: info

Docmatix: 2.4M Image DocVQA Dataset Released

More from Hugging Face

Get alerts for Hugging Face