Python SpeechRecognition

From wikinotes

Speech to text Command processing, supporting multiple back-ends.


Documentation

official github https://github.com/Uberi/speech_recognition#readme
realpython tutorial https://realpython.com/python-speech-recognition/

Install

sudo pip install SpeechRecognition

# if using CMUsphinx 
pacaur -S sphinxbase           # CMUsphinx
pacaur -S pocketsphinx         # libpocketsphinx (C++)
sudo pip install pocketsphinx  # pocketsphinx (python)

Usage

Overview

Get your device index using SpeechCommand (pyaudio on backend).

import speech_recognition

# misleading name, lists all audio devices
# in order of their device-indexes.
speech_recognition.Microphone.list_microphone_names()
#> ['HD Audio Pro', ...]

Listen to text on a loop, and process commands. (in this instance we are using python pocketsphinx audio processing backend).

device_index = 0
mic = speech_recognition.Microphone(device_index)
recognizer = speech_recognition.Recognizer()

while True:
    with mic as source:
        recognizer.adjust_for_ambient_noise(source)
        audio = recognizer.listen(source)
    text = recognizer.recognize_sphinx(audio)  # see other recognize_* methods for other backends
    print(text)

Backend Notes

CMUsphinx

CMUsphinx converts JSGF to FSG wherever a JSGF file is used for grammar.

Unfortunately, the implementation only works for a JSGF file with a single grammar/rule each sharing the same name.

You can create avoid this issue by creating your own FSG file using.

sphinx_jsgf2fsg -fsg out.fsg < grammar.jsgf