Speech Recognition using the Raspberry Pi
I've finally received my Raspberry Pi, and I've immediately gotten to work transferring the speech recognition system I used for the robotic arm to the pi. Due to its small size and low power requirements, the Raspberry Pi is an excellent platform for the Julius open-source speech recognition system. This opens up almost limitless possibilities for voice command applications.
EDIT: I am no longer working on Julius/HTK for speech recognition. Please see this post for more information.
There do exist commercial offerings of electronic voice command modules, as well as voice command applications appearing in recent smartphones (i.e. Siri), however, they are either not as versatile or not as cheap as the Raspberry Pi. Additionally, Julius is an LVCSR - a Large-Vocabulary Continuous Speech Recognition decoder, which means you can develop large vocabularies and complex grammars so you can make more natural voice interfaces.
In this tutorial, I will be demonstrating how to use the Raspberry Pi for a simple speech recognition system to control the Maplin USB Robotic Arm. Later on, I will demonstrate how to interface this system with other devices using the Raspberry Pi's GPIO pins.
- Raspberry Pi set up and running debian (please follow setup instructions from www.raspberrypi.org) and preferably connected to the internet
- USB microphone
Also, you need to have followed the instructions for creating an acoustic model in my earlier tutorial here using a full-sized computer. It's a lot easier to get it working on a full-sized computer then just transferring it to the pi, however you can of course follow the entire acoustic model generation tutorial right on the pi itself.
Since we are not using the GPU very much, we'll allocate less RAM for video memory. Instructions to do that can be found here [andybold.me.uk].
To save on resources the Raspberry Pi should only be running in command line mode, and all my instructions below in boxes should be typed into the Raspberry Pi command line.
To begin, we need to load the sound card driver. In the command line, type:
sudo modprobe snd_bcm2835
There are a few packages that we need to install to get the system working properly. To get them, you need the Raspberry Pi connected to the internet, otherwise, download the packages to the SD card and install them from there along with their dependencies. If you have a working internet connection on the pi, just type into the command line:
sudo apt-get install alsa-tools alsa-oss flex zlib1g-dev libc-bin libc-dev-bin python-pexpect libasound2 libasound2-dev cvs
I'm not sure if libc-bin and its headers are actually required, but this is what I installed when I was trying to get it to work. Now, to test if the microphone is working, try recording 10 seconds of audio using arecord (from alsa-tools) and play it back using aplay:
arecord -d 10 -D plughw:1,0 test.wav
The -D option (plughw:1,0) assigns the device you want to record from. Since the Raspberry Pi's internal sound device is using plughw:0,0, attaching a USB microphone would typically assign it to plughw:1,0. If you're attaching several sound devices to the Raspberry Pi, you should change this to the appropriate value, as well as the ALSADEV environment variable (see the last section below).
The latest stable version of the Julius LVCSR decoder (4.2.1) does not detect the ALSA headers correctly when I tried it. A search in the forums indicated that the CVS version was working, so my advice is to compile julius from the CVS source.
cvs -z3 -d:pserver:email@example.com:/cvsroot/julius co julius4
If you're using Raspbian, set the compiler flags by the environment variables:
export CFLAGS="-O2 -mcpu=arm1176jzf-s -mfpu=vfp -mfloat-abi=hard -pipe -fomit-frame-pointer"
Afterwards, go into the folder julius4 just created, and run the configure, make and make install commands:
sudo make install
Finally, julius needs an environment variable called ALSADEV to tell it which device to use for a microphone:
To run the speech recognition, you can simply copy an existing model if you have followed my previous tutorial. From my previous tutorial, copy the entire 'voxforge' folder with your acoustic models to the Raspberry Pi home directory in the SD card. Insert the SD card into the Raspberry Pi and boot it up. On the raspberry pi command line, navigate to voxforge/auto. Hopefully, once the environment variable is set you can execute Julius and begin speech input:
julius -input mic -C julius.jconf
I've encountered the issue of recognition accuracy dropping when I use the Raspberry Pi as compared to my laptop, using the same acoustic model. I've found that recompiling the HMMs on the Raspberry Pi improves this a bit, however it's still not as accurate as on my linux laptop. If you have any idea what's causing this, please let me know!
In the meantime, to compile HTK I'd recommend:
./configure --without-x --disable-hslab