Hugging Face: Cosmopedia opens large-scale synthetic data pipeline for LLM pre-training (Cosmo-1B, Mixtral-8x7B-Instruct-v0.1) | SignalBreak | SignalBreak