Robotic Arm with Voice Control - Part 2

Here is part 2 of the robotic arm tutorial. Click here for the first part.

Running julius

If you've managed to perform Part 1 of the tutorial successfully, then in the command line in the 'voxforge/auto' directory, run:

julius -input mic -C julian.jconf

Julius should then wait for microphone input, and if you speak into the microphone, for example 'elbow up' then it should print output similar to the following:

<<< please speak >>>
### read waveform input
Stat: capture audio at 48000Hz
Stat: adin_alsa: latency set to 32 msec (chunk = 1536 bytes)
Error: adin_alsa: unable to get pcm info from card control
Warning: adin_alsa: skip output of detailed audio device info
STAT: AD-in thread created
pass1_best: <s> ELBOW UP </s>
pass1_best_wordseq: 0 2 7 1
pass1_best_phonemeseq: sil | eh l b ow | ah p | sil
pass1_best_score: -9641.213867
### Recognition: 2nd pass (RL heuristic best-first)
STAT: 00 _default: 13 generated, 13 pushed, 5 nodes popped in 388
sentence1: <s> ELBOW UP </s>
wseq1: 0 2 7 1
phseq1: sil | eh l b ow | ah p | sil
cmscore1: 1.000 1.000 1.000 1.000
score1: -9581.207031
<<< please speak >>>

Note the decoder output

sentence1: <s> ELBOW UP </s>

As well as the confidence scores:

cmscore1: 1.000 1.000 1.000 1.000
score1: -9581.207031

Where cmscore1 indicates the confidences for the sentence, in this case, 'silence word word silence'. score1 shows the Viterbi score as calculated by Julius. All this is printed to the tty's stdout, and all we need to do is to filter out these lines and use them to control the robotic arm.

The program

I chose Python for the ease of programming as well as its wide range of modules. Since I'm not too worried about speed or memory usage, I managed to build the robotic arm and program it within a week despite my right hand being unable to type (yes, python's that great!). I assume by doing this tutorial you already have basic knowledge of Python. At the beginning of the script we import the required modules:

import pexpect

Pexpect is a module for creating a pseudo-tty that we can connect with the Julius LVCSR decoder to obtain its output. Once we obtain the output, we then 1. filter out the lines we need (mainly sentence1 and cmscore1, but I've also found the Viterbi score can be useful in some cases), and 2. filter out low-confidence results, and finally 3. send the commands via USB to the robotic arm.

Obtaining the output

There are many ways to obtain the output - I initially tried the subprocess module, but settled on the pexpect module as it didn't have the pipe buffering issues that subprocess encountered. With pexpect, spawning julius and getting its output in blocks was very simple:

child = pexpect.spawn ('julius -input mic -C julian.jconf')
while True:
		child.expect('please speak')
	except KeyboardInterrupt:

As before, there are many ways to filter out the desired output and confidence scores. First, I need to determine if an output was generated at all:

def process_julius(out_text):
    match_res = re.match(r'(.*)sentence1(\.*)', out_text, re.S)
    if match_res:

Here we use the python regular expressions (re) module, so put at the beginning of the python script:

import re

After making sure we have a sentence, we can then process the text block and extract sentence1, cmscore1 and score1:

def get_confidence(out_text):
    linearray = out_text.split("\n")
    for line in linearray:
        if line.find('sentence1') != -1:
            sentence1 = line
        elif line.find('cmscore1') != -1:
            cmscore1 = line
        elif line.find('score1') != -1:
            score1 = line
    cmscore_array = cmscore1.split()
    #process sentence
    err_flag = False
    for score in cmscore_array:
            ns = float(score)
        except ValueError:
        if (ns < 0.999):
            err_flag = True
            print "confidence error:", ns, ":", sentence1
    score1_val = float(score1.split()[1])
    if score1_val < -13000:
        err_flag = True
        print "score1 error:", score1_val, sentence1
    if (not err_flag):
        print sentence1
        print score1
        #process sentence

In this function, I also set the criterion of 0.999 for cmscore1, and -13000 for score1. You can tweak these values until you get good accuracy and robustness, and training the acoustic model more would also help. If the output passes all these tests, we then pass the command to the process_sentence() function to move the robot arm. To proceed, let's put the python script aside for a bit and examine the robotic arm USB protocol.

Robot Arm USB Protocol

According to notbrainsurgery's deconstruction of the robotic arm's USB protocol, the motors are controlled by 3-byte USB control transfers. Since these are ordinary electric motors, we can only control their direction in addition to turning it on or off.

The first byte can be divided into four half-nibbles. A half-nibble is two bits i.e. 00 or 01.

Reading the first byte left to right,

  1. the first half-nibble controls the shoulder,
  2. the second controls the elbow,
  3. the third controls the wrist,
  4. the fourth controls the grip.

For the second byte, the fourth half-nibble controls the base rotation.

Finally, for the third byte, the last bit turns the light on or off.

The value of the half-nibble (01 or 10) commands the motor to run one direction or the other, while "00" will stop the motor.

So the python script needs to set the USB control bits appropriately to move the motors. I chose to send a 1-second command to move the motors and then stop automatically. This spares me from having to frantically say "stop" while the robot arm grinds against its safety gear when the movement limit is reached, but the disadvantage is that I may have to issue commands repeatedly to get the desired movement. To do this, we need to import 2 modules: the python time module, and the pyusb module (which requires libusb) for interacting with USB devices:

import time
import usb.core

Now on to the final part of the tutorial!

Zircon - This is a contributing Drupal Theme
Design by WeebPal.