Hugging Face: Fine-tune Stable Diffusion with DDPO via TRL: DDPOTrainer and PPO-based optimization | SignalBreak | SignalBreak