InfoCapability

Build video generation datasets with Florence-2 and yt-dlp

AI Impact Summary

This post introduces tooling for creating video generation datasets, mirroring established practices like img2dataset and leveraging models like Stable Video Diffusion and Florence-2. The core focus is on a three-stage pipeline for acquisition, pre-processing/filtering, and processing, utilizing captioning and object recognition models to curate video clips based on criteria like watermark presence, aesthetic scores, and OCR data. The team is actively developing scripts to streamline dataset creation, offering a practical approach to fine-tuning video generation models.

Affected Systems

yt-dlpOpenCV

Date: Date not specified
Change type: capability
Severity: info

Build video generation datasets with Florence-2 and yt-dlp

More from Hugging Face

Get alerts for Hugging Face