Chapter 21 Transcribing

In this unit, we look at an example of speech recognition using python. In language studies, we often need to transcribe the audio data into texts for further study. This unit will show you how we can utilize python to automate the speech-to-text transcribing.

There are in general two sources of audio data:

  • From a audio file (e.g., .wav)
  • From the system microphone (e.g., speech created on the fly)

21.1 Working with Audio Files

  • Recognizer: First we need to initilize a Recognizer object, which is mainly responsible for the speech-to-text recognition.
  • AudioFile: Create an AudioFile object with a path to the audio file
  • AudioData: Process the audio file and record the data from the AudioFile into an AudioData object.
  • Choose the API for Speech-to-Text conversion: Use the Recognizer’s method, Recognizer.recognize_google(), to recognize speech in the AudioData.
import speech_recognition as sr
print(sr.__version__)
3.10.4
r = sr.Recognizer()
#r.recognize_google()
havard = sr.AudioFile('demo_data/audio/語音測試.wav')
with havard as source:
  ## adjust for noise
  #r.adjust_for_ambient_noise(source)
  audio = r.record(source)

type(havard)
<class 'speech_recognition.AudioFile'>
type(audio)
<class 'speech_recognition.audio.AudioData'>
r.recognize_google(audio, language='zh-TW')
'我現在在做基本的語音測試謝謝'

The speech_recognition library uses the FLAC format internally for audio processing. The library relies on an external tool, the FLAC command-line utility, to handle FLAC audio files. If this utility is not installed on your system, the library will raise an OSError. Please install FLAC before using the library.

21.2 Working with Microphone Inputs

  • Recognizer: First we need to initilize a Recognizer object, which is mainly responsible for the speech-to-text recognition.
  • Microphone: Create an Microphone object with a specific index to the system microphone
  • AudioData: Record the speech data from the Microphone into an AudioData object.
  • Choose the API for Speech-to-Text conversion: Use the Recognizer’s method, Recognizer.recognize_google(), to recognize speech in the AudioData.
import speech_recognition as sr
r = sr.Recognizer()
mic = sr.Microphone()
sr.Microphone.list_microphone_names()
mic = sr.Microphone(device_index=0)

with mic as source:
  audio = r.listen(source)
  
type(audio)

try:
  r.recognize_google(audio, language='zh')
except sr.UnknownValueError:
  print('Unable to recognize the speech.')

21.3 References