Build video generation datasets with Florence-2 and yt-dlp
AI Impact Summary
This post introduces tooling for creating video generation datasets, mirroring established practices like img2dataset and leveraging models like Stable Video Diffusion and Florence-2. The core focus is on a three-stage pipeline for acquisition, pre-processing/filtering, and processing, utilizing captioning and object recognition models to curate video clips based on criteria like watermark presence, aesthetic scores, and OCR data. The team is actively developing scripts to streamline dataset creation, offering a practical approach to fine-tuning video generation models.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info