Chapter 21 Transcribing
In this unit, we look at an example of speech recognition using python. In language studies, we often need to transcribe the audio data into texts for further study. This unit will show you how we can utilize python to automate the speech-to-text transcribing.
There are in general two sources of audio data:
- From a audio file (e.g.,
.wav
) - From the system microphone (e.g., speech created on the fly)
21.1 Working with Audio Files
Recognizer
: First we need to initilize aRecognizer
object, which is mainly responsible for the speech-to-text recognition.AudioFile
: Create anAudioFile
object with a path to the audio fileAudioData
: Process the audio file and record the data from theAudioFile
into anAudioData
object.- Choose the API for Speech-to-Text conversion: Use the Recognizer’s method,
Recognizer.recognize_google()
, to recognize speech in theAudioData
.
import speech_recognition as sr
sr.__version__
'3.8.1'
= sr.Recognizer()
r #r.recognize_google()
= sr.AudioFile('demo_data/audio/語音測試.wav')
havard with havard as source:
## adjust for noise
#r.adjust_for_ambient_noise(source)
= r.record(source)
audio
type(havard)
<class 'speech_recognition.AudioFile'>
type(audio)
<class 'speech_recognition.AudioData'>
='zh') r.recognize_google(audio, language
'我現在在做基本的語音測試謝謝'
21.2 Working with Microphone Inputs
Recognizer
: First we need to initilize aRecognizer
object, which is mainly responsible for the speech-to-text recognition.Microphone
: Create anMicrophone
object with a specific index to the system microphoneAudioData
: Record the speech data from theMicrophone
into anAudioData
object.- Choose the API for Speech-to-Text conversion: Use the Recognizer’s method,
Recognizer.recognize_google()
, to recognize speech in theAudioData
.
import speech_recognition as sr
= sr.Recognizer()
r = sr.Microphone()
mic
sr.Microphone.list_microphone_names()= sr.Microphone(device_index=0)
mic
with mic as source:
= r.listen(source)
audio
type(audio)
try:
='zh')
r.recognize_google(audio, languageexcept sr.UnknownValueError:
print('Unable to recognize the speech.')
21.3 References
- The Ultimate Guide to Speech Recognition with Python
- Harvard Sentences: These phrases were published by the IEEE in 1965 for use in speech intelligibility testing of telephone lines. They are still used in VoIP and cellular testing today. Available recordings of these sentences can be found on the Open Speech Repository website.