Robot Arm with Voice Control - Tutorial (part 1)

robot arm photo

NB: I will take my time creating this tutorial, as my typing speed is slower due to my wrist injury. I will be able to type much faster when my plaster comes off in about 2 weeks' time.

A couple of months ago, I found this link describing the USB protocol for a robotic arm that I spotted in a shop previously. A few weeks ago when I suffered a minor wrist injury and temporarily lost the use of my right hand, I decided to assemble the robot arm (decidedly a slow process with just one hand) and test out notbrainsurgery's code. It didn't take me long to realise that the most convenient way to control it was through voice commands rather than typing in command line arguments.

So what do you need to build this voice-controlled robot arm?

  1. Robot arm (from OWI, or Maplin if you're in the UK)
  2. computer running Linux
  3. microphone
  4. Hidden Markov Model tool kit (HTK) from Cambridge university
  5. Julius speech recognition decoder developed by CSRC, Japan
  6. recording software (Audacity, Sound Recorder, etc.)
  7. libusb version 1.0
  8. a quiet environment to record voice samples
NB: While all of the packages are practically open-source, HTK contains restrictions on its license restricting it only to personal and research use. I don't think it's an OSI-approved license so technically it can't be called open source. However, we only use HTK to train the model, so once the model is created the restrictions do not extend to the model.

Theoretically you can do this on a computer running Windows or OSX, however it is generally easier on Linux (and please, if you decide to try this on OSX or Windows, don't ask me for help - google is your friend). You can use any flavour of Linux. I'm currently using Ubuntu 11.04, but any distro with GCC and a good package manager will do.

This tutorial is mostly based on the Voxforge howto. I have added additional details and links, but if you really want to go into the details of the acoustic models, then the Voxforge tutorial is an excellent place to learn.

Creating a vocabulary and grammar

The first step is deciding on a command vocabulary. The robotic arm has 3 joints, a manipulator as well as a rotating base. Additionally it has a white LED for illumination at the manipulator.

Diagram

I have chosen the vocabulary to consist of:

  • Shoulder
  • Elbow
  • Wrist
  • Grip
  • Light
  • on/off
  • up/down
  • open/close

I then use this to create my robot's vocabulary file (sample.voca) in HTK format:

% NS_B
<s>        sil

% NS_E
</s>        sil

% JOINT_N
WRIST        r ih s t
ELBOW        eh l b ow
SHOULDER        sh ow l d er

% DEV
LIGHT        l ay t

% DEV_IN
ON        aa n
OFF        ao f

% MANIP
GRIP        g r ih p

% M_ACT
OPEN        ow p ax n
CLOSE       k l ow s

% DIRECTION
UP        ah p
DOWN      d aw n

% ROTATION
LEFT        l eh f t
RIGHT       r ay t

Each word needs to be broken into its phonemes - that is, each unit of sound in the word. English is usually quoted to have 44 phonemes, and you can use the Voxforge or the CMU phoneme dictionary to determine the phonemes in your vocabulary. Now what are the headings "% JOINT_N" and others for? They specify the word type which we use for creating the command grammar, as I describe in the section below.

Creating a grammar

Now we need to create a grammar. Languages can have complex grammars, but our robot just needs to understand a small subset. For now, I've set the possible sentences to be:

  1. Joint + up | down
  2. Grip + open | close
  3. Light + on | off
  4. Left | Right

Where Joint can refer to either shoulder, elbow or wrist. What this means is that a sentence could be one of "Elbow up" or "Elbow down" but not "Shoulder open". Grammars are formally described using Backus-Naur Form, or BNF for short. It's up to you how to structure the robot's grammar, however I've kept it simple for a start. Here is the grammar file (sample.grammar) for my robotic arm in the HTK format:

S : NS_B SENT NS_E
SENT: JOINT_N DIRECTION
SENT: MANIP M_ACT
SENT: DEV DEV_IN
SENT: ROTATION

Training the Model

To train the acoustic model, you need to record your voice samples. Software such as Audacity can come in useful. Make sure that you record your voice in a reasonably quiet environment, in a normal talking speed and volume. About 0.5 seconds of silence at the beginning and at the end is recommended. The Voxforge documentation mentioned earlier has more details in how to record voice samples.

Make sure that the sampling rate is set to 48000 Hz, with 16 bits per sample and the channel is set to mono. Export the files as WAV (Microsoft 16 bit PCM).

In summary, you need to record sample1.wav, sample2.wav... and so on, one wav file per line in the sample.prompts file below. If you add more words in your vocabulary, then it is highly recommended that you put those words in the prompts file.

*/sample1 OPEN OPEN CLOSE CLOSE OPEN OPEN CLOSE CLOSE
*/sample2 GRIP WRIST SHOULDER ELBOW LEFT RIGHT OPEN CLOSE UP DOWN
*/sample3 LEFT WRIST RIGHT WRIST LEFT WRIST RIGHT WRIST
*/sample4 UP UP DOWN DOWN OPEN OPEN CLOSE CLOSE
*/sample5 SHOULDER SHOULDER ELBOW ELBOW GRIP GRIP
*/sample6 ELBOW UP ELBOW DOWN SHOULDER UP SHOULDER DOWN
*/sample7 GRIP OPEN GRIP CLOSE GRIP OPEN GRIP CLOSE
*/sample8 LEFT LEFT RIGHT RIGHT LEFT LEFT RIGHT RIGHT
*/sample9 LIGHT RIGHT LIGHT RIGHT LIGHT RIGHT
*/sample10 ON OFF ON OFF ON OFF ON OFF
*/sample11 BOOKENDS KENNEL KENNETH KENYA WEEKEND
*/sample12 BELT BELOW BEND AEROBIC DASHBOARD DATABASE
*/sample13 GATEWAY GATORADE GAZEBO AFGHAN AGAINST AGATHA
*/sample14 ABALON ABDOMINALS BODY ABOLISH
*/sample15 ABOUNDING ABOUT ACCOUNT ALLENTOWN
*/sample16 ACHIEVE ACTUAL ACUPUNCTURE ADVENTURE
*/sample17 ALGORITHM ALTHOUGH ALTOGETHER ANOTHER
*/sample18 BATTLE BEATLE LITTLE METAL
*/sample19 BITTEN BLATANT BRIGHTEN BRITAIN
*/sample20 BROOKHAVEN HOOD BROUHAHA BULLHEADS
*/sample21 BUSBOYS CHOICE COILS COIN
*/sample22 COLLECTION COLORATION COMBINATION COMMERCIAL
*/sample23 MIDDLE NEEDLE POODLE SADDLE
*/sample24 ALRIGHT ARTHRITIS BRIGHT COPYRIGHT CRITERIA RIGHT
*/sample25 COUPLE CRADLE CRUMBLE
*/sample26 CUBA CUBE CUMULATIVE
*/sample27 CURING CURLING CYCLING
*/sample28 CYNTHIA DANFORTH DEPTH
*/sample29 DIGEST DIGITAL DILIGENT
*/sample30 AMNESIA ASIA AVERSION BEIGE BEIJING
*/sample31 HELP HELLO HELMET HELPLESS AHEAD HELP

Putting everything together

As per the Voxforge documentation, create a folder 'voxforge' in your home directory, as well as 'voxforge/auto' and 'voxforge/HTK_scripts'. Also create 'voxforge/auto/train', 'voxforge/auto/train/wav', 'voxforge/auto/train/mfc' and 'voxforge/lexicon'

Now,

  • place sample.grammar, sample.voca, prompts in voxforge/auto
  • put all the wav files (sample1.wav ....etc) into voxforge/auto/train/wav
  • unzip the Hidden Markov Model tool kit (the HTK_samples archive) and put the following HTK scripts into voxforge/HTK_scripts:
    • maketrihed
    • mkclscript.prl (from /samples/RMHTK/perl_scripts/)
    • prompts2mlf
    • prompts2wlist
  • put voxforge_lexicon into voxforge/lexicon

Now in the voxforge/auto folder, execute:

mkdfa sample

In the voxforge/auto folder, create a file called codetrain.scp and fill it with:

../train/wav/sample1.wav ../train/mfcc/sample1.mfc
../train/wav/sample2.wav ../train/mfcc/sample2.mfc
../train/wav/sample3.wav ../train/mfcc/sample3.mfc
../train/wav/sample4.wav ../train/mfcc/sample4.mfc
../train/wav/sample5.wav ../train/mfcc/sample5.mfc
....

Make sure there is one line corresponding to each wav file you have created. This is a list of all the wav files and the MFCC file (an HTK format) it will be converted to.

Download and extract the following scripts in your 'voxforge/auto/scripts' folder: (download from Voxforge)

Then, in the voxforge/auto/scripts folder, execute:

./HTK_Compile_Model.sh

If it finishes without errors, you can now proceed to the Tutorial part 2!

Tags: 

Comments

Hi,

I've followed each step from here, and the ./HTK_Compile model step doesn't work, here's the output:

~/voxforge/auto/scripts$ ./HTK_Compile_Model.sh

init
==============================================================

Step 1 - Task Grammar
==============================================================
already completed manually

Step 2 - Pronunciation Dictionnary
==============================================================
sorting:./interim_files/wlist to:./interim_files/wlist1
Error!! ../../lexicon/voxforge_lexicon not found!
ERROR [+5010] InitSource: Cannot open source file ../../lexicon/voxforge_lexicon
ERROR [+1410] CreateBuffer: Can't open file ../../lexicon/voxforge_lexicon
FATAL ERROR - Terminating program HDMan
***Please review the following HDMan output***:

Step 3 - Recording the Data
==============================================================
already completed manually

Step 4 - Creating Transcription Files
==============================================================
writing to mlf file ./interim_files/words.mlf
writing to ./interim_files/words.mlf file done
ERROR [+1232] NumParts: Cannot find word COMPUTER in dictionary
FATAL ERROR - Terminating program HLEd
ERROR [+1232] NumParts: Cannot find word COMPUTER in dictionary
FATAL ERROR - Terminating program HLEd

Hi,

Sorry about that, I forgot to put one step (see the section about voxforge/lexicon above), I've corrected the tutorial for that now. Let me know how it goes!

regards,
aO(N²)

I ran and performed tasks up until the final step and received the following error;

Step 4 - Creating Transcription Files
==============================================================
writing to mlf file ./interim_files/words.mlf
Unable to open prompt file ../prompts at ../../HTK_scripts/prompts2mlf line 30.

Step 5 - Coding the (Audio) Data
==============================================================
ERROR [+6311] SaveBuffer: cannot create file ../train/mfcc/sample1.mfc
ERROR [+1014] PutTargetFile: Could not save parm file ../train/mfcc/sample1.mfc
FATAL ERROR - Terminating program HCopy

Thank you in advance!! This is an awesome thing that you did!

Had the same problem, change the name of voxforge/auto/train/mfc to voxforge/auto/train/mfcc that worked for me.

Hi there,
I've been trying to follow you tutorial as best I can but I'm now stumped.
I figured out that I needed to switch from dash to bash to get ./HTK_Compile_Model.sh to run.... but it seems to start going wrong around step 4
Any help would be greatly apreciated.

Step 4 - Creating Transcription Files
==============================================================
writing to mlf file ./interim_files/words.mlf
Unable to open prompt file ../prompts at ../../HTK_scripts/prompts2mlf line 30.

Step 5 - Coding the (Audio) Data
==============================================================

Step 6 - Creating Monophones
==============================================================
making hmm0

ERROR [+6310] OpenParmChannel: cannot open Parm File ....
ERROR [+6313] OpenAsChannel: OpenParmChannel failed
ERROR [+6316] OpenBuffer: OpenAsChannel failed
ERROR [+2050] LoadFile: Config parameters invalid
FATAL ERROR - Terminating program HCompV
tail: cannot open `./interim_files/hmm0/proto' for reading: No such file or directory
head: cannot open `./interim_files/hmm0/proto' for reading: No such file or directory
cat: ./interim_files/hmm0/vFloors: No such file or directory
making hmm1

ERROR [+5010] InitSource: Cannot open source file sil
ERROR [+7010] LoadHMMSet: Can't find file
ERROR [+2321] Initialise: LoadHMMSet failed
FATAL ERROR - Terminating program HERest
making hmm2

ERROR [+5010] InitSource: Cannot open source file ./interim_files/hmm1/macros
ERROR [+7010] LoadAllMacros: Can't open file
ERROR [+5010] InitSource: Cannot open source file ./interim_files/hmm1/hmmdefs
ERROR [+7010] LoadAllMacros: Can't open file
ERROR [+7050] LoadHMMSet: Macro name expected
ERROR [+2321] Initialise: LoadHMMSet failed
FATAL ERROR - Terminating program HERest
making hmm3

ERROR [+5010] InitSource: Cannot open source file ./interim_files/hmm2/macros
ERROR [+7010] LoadAllMacros: Can't open file
ERROR [+5010] InitSource: Cannot open source file ./interim_files/hmm2/hmmdefs
ERROR [+7010] LoadAllMacros: Can't open file
ERROR [+7050] LoadHMMSet: Macro name expected
ERROR [+2321] Initialise: LoadHMMSet failed
FATAL ERROR - Terminating program HERest

Hi there.

Thanks for your great tutorial.

I'm toying with the idea of reproduce your project but I'll be interested to be able to control the robot using Spanish as a language.

As far as I know, the speech recognition engine that you used is language independent but I can't find how to create the vocabulary. In the voxforge website there is no dictionary for Spanish to be able to identify the representation of the phonemes for the Julius engine.

Any idea how to continue?

Thank you very much.

Hi.
Firstly, thank you for a great blog, we have been following your articles with great interest and have been trying it out with our Pi's.
Would you consider formatting this in the form of articles and allowing us to publish in our MagPi magazine please? To discuss this further please contact us at editor@themagpi.com Many Thanks

Or can it run windows 7
if it does require linux, is there an alternative method for windows
Thanks,
Adnan

I'm getting this error: init
==============================================================

Step 1 - Task Grammar
==============================================================
already completed manually

Step 2 - Pronunciation Dictionnary
==============================================================
Unable to open prompts ../prompts file for reading at ../../HTK_scripts/prompts2wlist line 16.
sorting:./interim_files/wlist to:./interim_files/wlist1
Found voxforge_lexicon
./HTK_Compile_Model.sh: line 224: HDMan: command not found
cat: ./interim_files/monophones1: No such file or directory
***Please review the following HDMan output***:

cat: logs/Step2_HDMan_log: No such file or directory

any help?