Mozilla deepspeech

From wikinotes

deepspeech is a speech-to-text engine written by mozilla.

Documentation

official github https://github.com/mozilla/DeepSpeech
official discourse forums (very helpful) https://discourse.mozilla.org/c/deep-speech
intro tutorial https://progur.com/2018/02/how-to-use-mozilla-deepspeech-tutorial.html

NOTE:

Deepspeech's python bindings do not come with documentation, Use java binding documentation, and documentation provided by cli. https://github.com/mozilla/DeepSpeech/blob/master/native_client/java/libdeepspeech/src/main/java/org/mozilla/deepspeech/libdeepspeech/DeepSpeechModel.java

Install

NOTE:

There are two variations of tensorflow available - one that uses NVIDIA gpus, and one that uses the cpu only. I haven't had success installing cuda/deepspeech-gpu (reports missing library that exists in /opt/cuda)

# CPU version
sudo pip install deepspeech


# check installed version
deepspeech --version

# download/extract models for your version
https://github.com/mozilla/DeepSpeech/releases   # ex: deepspeech-0.5.1-models.tar.gz
tar -xvf deepspeech-0.5.1-models.tar.gz

Usage

Commandline

deepspeech \
    --model models/output_graph.pb \
    --alphabet models/alphabet.txt \
    --audio /var/tmp/audio.wav \
    > text.txt

Python

mozilla deepspeech: python recording example