Chapter 21 Transcribing
In this unit, we look at an example of speech recognition using python. In language studies, we often need to transcribe the audio data into texts for further study. This unit will show you how we can utilize python to automate the speech-to-text transcribing.
There are in general two sources of audio data:
- From a audio file (e.g.,
.wav
) - From the system microphone (e.g., speech created on the fly)
21.1 Working with Audio Files
Recognizer
: First we need to initilize aRecognizer
object, which is mainly responsible for the speech-to-text recognition.AudioFile
: Create anAudioFile
object with a path to the audio fileAudioData
: Process the audio file and record the data from theAudioFile
into anAudioData
object.- Choose the API for Speech-to-Text conversion: Use the Recognizer’s method,
Recognizer.recognize_google()
, to recognize speech in theAudioData
.
3.10.4
r = sr.Recognizer()
#r.recognize_google()
havard = sr.AudioFile('demo_data/audio/語音測試.wav')
with havard as source:
## adjust for noise
#r.adjust_for_ambient_noise(source)
audio = r.record(source)
type(havard)
<class 'speech_recognition.AudioFile'>
<class 'speech_recognition.audio.AudioData'>
'我現在在做基本的語音測試謝謝'
The speech_recognition library uses the FLAC format internally for audio processing. The library relies on an external tool, the FLAC command-line utility, to handle FLAC audio files. If this utility is not installed on your system, the library will raise an OSError. Please install FLAC before using the library.
21.2 Working with Microphone Inputs
Recognizer
: First we need to initilize aRecognizer
object, which is mainly responsible for the speech-to-text recognition.Microphone
: Create anMicrophone
object with a specific index to the system microphoneAudioData
: Record the speech data from theMicrophone
into anAudioData
object.- Choose the API for Speech-to-Text conversion: Use the Recognizer’s method,
Recognizer.recognize_google()
, to recognize speech in theAudioData
.
import speech_recognition as sr
r = sr.Recognizer()
mic = sr.Microphone()
sr.Microphone.list_microphone_names()
mic = sr.Microphone(device_index=0)
with mic as source:
audio = r.listen(source)
type(audio)
try:
r.recognize_google(audio, language='zh')
except sr.UnknownValueError:
print('Unable to recognize the speech.')
21.3 References
- The Ultimate Guide to Speech Recognition with Python
- Harvard Sentences: These phrases were published by the IEEE in 1965 for use in speech intelligibility testing of telephone lines. They are still used in VoIP and cellular testing today. Available recordings of these sentences can be found on the Open Speech Repository website.