InfoCapability

StarCoder2-Instruct Open-Source Self-Alignment for Code Generation (StarCoder2-15B-Instruct-v0.1)

AI Impact Summary

StarCoder2-Instruct introduces a fully self-aligned, permissive pipeline for code generation, using StarCoder2-15B to generate instruction-response pairs without human annotation or GPT-4 distillation. The approach leverages seed code from The Stack v1, in-context concept extraction, and execution-guided self-validation to produce a large self-generated SFT dataset, achieving 72.6 HumanEval and outperforming several larger or distilled open models. This signals a tangible open-source path for high-quality code models with permissive licensing, enabling in-house fine-tuning and customization without reliance on proprietary teacher models, while introducing practical considerations around data provenance, sandbox testing, and evaluation baselines.

Affected Systems

StarCoder2-15BStarCoder2-15B-Instruct-v0.1

Date: Date not specified
Change type: capability
Severity: info

StarCoder2-Instruct Open-Source Self-Alignment for Code Generation (StarCoder2-15B-Instruct-v0.1)

More from Hugging Face

Get alerts for Hugging Face