InfoCapability

Liger GRPO integration with TRL encounters shape mismatch during Qwen2.5-0.5B-Instruct training

AI Impact Summary

An effort to run Liger GRPO with TRL against Qwen2.5-0.5B-Instruct on DeepSpeed ZeRO-3 bf16 triggers a runtime shape mismatch in the fused PPO loss path. The traceback walks through grpo_loss.forward -> LigerFusedLinearGRPOFunction -> fused_linear_ppo.py accumulate_chunk, indicating expected tensor shapes don't align with current inputs, batch layout, or model dimensions. This reveals a compatibility gap between Liger GRPO's fused kernels and the Qwen 2.5B-Instruct deployment under ZeRO-3, risking training pauses until a fix is applied. Engineers should verify hidden sizes, sequence lengths, and chunking compatibility between Qwen, TRL, and the Liger GRPO fused implementation.

Affected Systems

Liger GRPOTRL

Date: Date not specified
Change type: capability
Severity: info

Liger GRPO integration with TRL encounters shape mismatch during Qwen2.5-0.5B-Instruct training

More from Hugging Face

Get alerts for Hugging Face