Hugging Face: Cosmopedia open synthetic data pipeline for LLM pre-training using Mixtral-8x7B-Instruct-v0.1 | SignalBreak | SignalBreak