InfoCapability

SpeechT5 now available in Hugging Face Transformers — ASR, TTS, and speech-to-speech

AI Impact Summary

SpeechT5 is now available in Hugging Face Transformers, offering a single encoder-decoder backbone for speech-to-text (ASR), text-to-speech (TTS), and speech-to-speech tasks. The integration demonstrates per-task pre-nets and post-nets and includes examples with models like microsoft/speecht5_tts and microsoft/speecht5_hifigan, including speaker embeddings for voice conversion. Production use will require non-standard install steps (installing from GitHub), plus dependencies such as sentencepiece and a compatible vocoder, and managing a 16 kHz sample rate.

Affected Systems

SpeechT5SpeechT5ForTextToSpeech

Date: Date not specified
Change type: capability
Severity: info

SpeechT5 now available in Hugging Face Transformers — ASR, TTS, and speech-to-speech

More from Hugging Face

Get alerts for Hugging Face