Docmatix: 2.4M Image DocVQA Dataset Released
AI Impact Summary
The release of Docmatix represents a significant expansion of the available data for Document Visual Question Answering (DocVQA), scaling the dataset to 2.4 million images and 9.5 million Q/A pairs derived from 1.3 million PDFs. This substantial increase in scale, a 240x improvement over previous datasets, allows for more robust training of Vision-Language Models (VLMs) and potentially closes the performance gap with proprietary models like Idefics2. Fine-tuning Florence-2 on Docmatix yielded a 20% performance increase on the DocVQA benchmark.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info