Porting fairseq WMT19 translation system to transformers
AI Impact Summary
Porting the WMT19 translation pipeline from fairseq to HuggingFace Transformers centralizes the workflow around a Transformers-based deployment path. The effort includes a conversion script to translate checkpoint, dictionaries, and tokenizer files into the Transformers format, and addresses vocabulary compatibility between de-en/ en-de merged vocabularies and separate vocabularies. It leverages torch.hub to fetch pre-trained weights and integrates Moses, FastBPE, and related tooling for tokenization and BPE. This expands reuse of high-quality WMT19 models in new production environments but imposes a migration step and thorough validation of vocabularies and tokenizers.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info