OpenMed trains 25-species mRNA language model — CodonRoBERTa-large-v2 achieves 4.10 perplexity
AI Impact Summary
OpenMed has developed a novel mRNA language model pipeline trained across 25 species, demonstrating a significant advancement in protein AI. The core innovation, CodonRoBERTa-large-v2, achieved a perplexity of 4.10 and a Spearman CAI correlation of 0.40, substantially outperforming ModernBERT, highlighting the importance of pre-training data and architecture for biological sequence modeling. This represents a critical step towards automated protein design and synthesis.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info