Speech-To-Text

Information about Speech-To-Text tech

Hugging Face wav2vec2

The Wav2Vec2 model was proposed in wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations by Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli.

from Model Documentation

Pretrained Models:

  • Shahu Kareem’s result using Common Voice data, from Hugging Face model training week:
    • Full Model
    • Quantized Model: gdown –id 1m6QXhMF9Zf6P04Z1D2qFiQjEFo16Vexv

Mozilla DeepSpeech

DeepSpeech is an open-source Speech-To-Text engine, using a model trained by machine learning techniques based on Baidu’s Deep Speech research paper. Project DeepSpeech uses Google’s TensorFlow to make the implementation easier.

The toolchain has been show to work well. A pretrained model is unavailble at this time.