SpeechT5 now available in Hugging Face Transformers — ASR, TTS, and speech-to-speech
AI Impact Summary
SpeechT5 is now available in Hugging Face Transformers, offering a single encoder-decoder backbone for speech-to-text (ASR), text-to-speech (TTS), and speech-to-speech tasks. The integration demonstrates per-task pre-nets and post-nets and includes examples with models like microsoft/speecht5_tts and microsoft/speecht5_hifigan, including speaker embeddings for voice conversion. Production use will require non-standard install steps (installing from GitHub), plus dependencies such as sentencepiece and a compatible vocoder, and managing a 16 kHz sample rate.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info