Information about Speech-To-Text tech
Hugging Face wav2vec2
The Wav2Vec2 model was proposed in wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations by Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli.
from Model Documentation
- Shahu Kareem’s result using Common Voice data, from Hugging Face model training week:
- Full Model
- Quantized Model: gdown –id
DeepSpeech is an open-source Speech-To-Text engine, using a model trained by machine learning techniques based on Baidu’s Deep Speech research paper. Project DeepSpeech uses Google’s TensorFlow to make the implementation easier.
The toolchain has been show to work well. A pretrained model is unavailble at this time.