Fine-tune olmOCR for faithful OCR with header/footer preservation
AI Impact Summary
A team has repurposed olmOCR-7B-0225-preview into a faithful OCR engine that preserves header and footer data, filling a gap where business documents rely on top/bottom fields. They generated 8,000 documents with Qwen2.5-VL-72B-Instruct and retrained using the original olmOCR pipeline with 4 gradient accumulation steps on 8x H100 nodes, tracking experiments in MlFlow. The fine-tuned model now extracts all information, including header/footer sections, and still parses simple tables, enabling more reliable invoice parsing and other layout-rich document workflows.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info