InfoCapability

TRL: Introducing MPO, GRPO, and GSPO for Vision Language Model Alignment

AI Impact Summary

This release introduces several advanced alignment techniques for Vision Language Models (VLMs) within the TRL framework. Specifically, it adds support for Mixed Preference Optimization (MPO), Group Relative Policy Optimization (GRPO), and Group Sequence Policy Optimization (GSPO), building on existing SFT and DPO methods. These new methods aim to improve multimodal alignment by extracting richer signals from preference data and scaling better with modern VLMs, offering potential performance gains compared to previous approaches.

Affected Systems

TRLGPT-3.5 Turbo

Date: Date not specified
Change type: capability
Severity: info

TRL: Introducing MPO, GRPO, and GSPO for Vision Language Model Alignment

More from Hugging Face

Get alerts for Hugging Face