Hugging Face: TRL enables Direct Preference Optimization (DPO) for Vision-Language Models with Idefics2-8b support | SignalBreak | SignalBreak