Speech-To-Text
Information about Speech-To-Text tech
Hugging Face wav2vec2
The Wav2Vec2 model was proposed in wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations by Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli.
from Model Documentation
Pretrained Models:
- Shahu Kareem’s result using Common Voice data, from Hugging Face model training week:
- Full Model
- Quantized Model: gdown –id
1m6QXhMF9Zf6P04Z1D2qFiQjEFo16Vexv
Mozilla DeepSpeech
DeepSpeech is an open-source Speech-To-Text engine, using a model trained by machine learning techniques based on Baidu’s Deep Speech research paper. Project DeepSpeech uses Google’s TensorFlow to make the implementation easier.
The toolchain has been show to work well. A pretrained model is unavailble at this time.