Call Us Today! 877.742.2583




Page tree
Skip to end of metadata
Go to start of metadata

About

Pocketsphinx is an open source speech recognition engine developed by Carnegie Mellon University. mod_pocketsphinx allows FreeSWITCH™ to recognize speech.

  • Works on Windows, Mac OS X and Linux.
  • 8k and 16k acoustical models.
  • Semi-continuous recognition.
  • Great for smaller grammars.

 Click here to expand Table of Contents

Install & Configure

  1. Please update to at least rev 9194 so this will work correctly. Scoring was changed to be 0 = bad and 100 = good.
  2. Build FreeSWITCH™ and enable mod_pocketsphinx
  3. FreeSWITCH™ will automatically download and install pocketsphinx
  4. enable mod_pocketsphinx in the Modules.conf.xml

Grammar Files

  • Version 1.0.4 uses JSGF grammar files.
  • More information about formatting can be found here.

pizza_yesno.gram


Setting up the Pizza Demo

  • copy the demo scripts from the source to your working directory

 

  • if you are doing this on an old install you must copy the pocketsphinx.conf.xml to the conf directory

 

  • Download the sounds files from here
  • Move extracted pizza directory to sounds directory under freeswitch install (eg, /usr/local/freeswitch/sounds/en/us)
  • Newer FreeSWITCH versions already contain /usr/local/freeswitch/conf/dialplan/default/00_pizza_demo.xml which sets up 74992 or "pizza" as an extension. If you are on an older FreeSWITCH version, make an extension like this:

 

 
  • edit your ps_pizza.js with the location of your sound files

 

  • Install grammar files

 

 
  • Give it a try by calling extension 74992 and watching the console for messages.

Other info

Mod_pocketsphinx will build in the standard build on Linux and Mac. Yet to be tested on windows.

confidence score is 0+ higher numbers = more confidence.


Acoustic Model for german language

An acoustic model describes a certain language on a phone base. A phone is something like a smallest distinguishable noise of a certain language. Dictionaries are used to sum up the phones to a word.

PocketSphinx comes with an english acoustic model which is to be used (of course) for the english language. For other languages you have to create your own acoustic model. This is a lot of work, especially creating the needed audio database (audio files, phone list, transcriptions and dictionary)

Voxforge (www.voxforge.org) offers, among other things, a german acoustic model under a GPL license found here: [1] Unfortunately it is not usable by PocketSphinx so we have to change it.

Based on Voxforge's audio data, the following lines describe how to build a PS compatible acoustic model (8kHz sample rate). It was tested on a CENTOS 5.3 x86_64 GNU/Linux system.

Requirements

Make sure the following is installed

  • Python (e.g 2.4.3)
  • flac (e.g. 1.1.2)

Download the following from voxforge.org

Process

 

  • Create work directory

 

  • this new dir is now our <workdir>
  • Prepare SphinxTrain

 

 

  • Setup sphinx training environment “voxforge_de_sphinx”
  • ./SphinxTrain-1.0/scripts_pl/setup_SphinxTrain.pl -task voxforge_de_sphinx
  • Content of <workdir>/

 

drwxr-xr-x   2 ssw voip    4096  5. Aug 11:32 bin
drwxr-xr-x   2 ssw voip    4096  5. Aug 11:32 bwaccumdir
drwxr-xr-x   2 ssw voip    4096  5. Aug 11:32 etc
drwxr-xr-x   2 ssw voip    4096  5. Aug 11:32 feat
drwxr-xr-x   2 ssw voip    4096  5. Aug 11:32 logdir
drwxr-xr-x   2 ssw voip    4096  5. Aug 11:32 model_architecture
drwxr-xr-x   2 ssw voip    4096  5. Aug 11:32 model_parameters
drwxr-xr-x   3 ssw voip    4096  5. Aug 11:32 python
drwxr-xr-x  20 ssw voip    4096  5. Aug 11:32 scripts_pl
drwxr-xr-x   2 ssw voip    4096  5. Aug 11:32 wav
drwxr-xr-x  14 ssw voip    4096  5. Aug 11:02 SphinxTrain-1.0
-rw-r--r--   1 ssw voip 8297682 12. Feb 17:01 SphinxTrain-1.0.tar.bz2

 

 

  • Copy Sphinxbase version from freeswitch source directory
  • Extract acoustic model in a new directory

 

 

  • Preparing audio data (here 8kHz sample rate)
    • Put voxforge's audio archives to <workdir>/audio
    • Extract all archives

    • Create script “copy_and_convert_audio.sh ”in <workdir>


 

  • Converting (some are in flac format) and copy audio data to <workdir>/wav directory
  • bash ./copy_and_convert_audio.sh (you must be in <workdir> directory)
  • Create a feature file in <workdir>:
    • vi <workdir>/my_feat.params

 

  • Create script for renaming MFC files in <workdir>.
    • vi <workdir>/renameMFC.sh

 

 

  • Copy Voxforge's configurations to <workdir>/etc
    • cp ./am_tmp/etc/* ./etc/
  • Replace feature file with our own
    • cp ./my_feat.params ./etc/feat.params
  • Adapt Voxforge’s sphinx_trrain.cfg to our environment:
    • vi <workdir>/etc/sphinx_train.cfg

 

  • Content of <workdir>

 

  • At least one File (openpento-20080512-2_3_exp_5_1_Unit_0) is somehow corrupt, so delete line containing the name from:
    • ./etc/voxforge_de_sphinx_train.transcription
    • ./etc/voxforge_de_sphinx_train.fileids
    • Then delete the file "./wav/openpento-20080512-2_3_exp_5_1_Unit_0.wav"
  • Create MFC files of wav files
    • <workdir>/sphinxbase/bin/sphinx_fe `cat ./etc/feat.params` -c ./etc/voxforge_de_sphinx_train.fileids -di ./wav -do ./feat/ -ei wav -eo mfc -raw no -mswav yes -samprate 8000

INFO: fe_interface.c(288): You are using the internal mechanism to generate the seed.

 

  • Get rid of those ".ch1." parts in some MFC files
    • cd <workdir>/feat
    • bash ../renameMFC.sh
    • cd ..


You are now ready to start the training process. Before you do so, you can start a verification of all your provided data:

Execute „<workdir>/scripts_pl/00.verify/verify_all.pl

 


Looks good so far. So let's start the training:

 

Now you can go and get a cup of coffee or tea or go to bed or...

[...]

For me the process ended with this:

 

The target folder "<workdir>/model_parameters/voxforge_de_sphinx.ci_semi" looks now like this:



Then I copied those files to "<fs-folder>/grammar/model/de4/".

Further I copied "<workdir>./etc/voxforge_de_sphinx.dic" to "<fs-folder>/grammar/de4.dic" and created a grammar file which contained the words which should be recognized.

Finally I configured "pocketsphinx.conf.xml" like this:

 

That's all you have to do as far as i know ... The results on my side were ... erm well ... suboptimal. After reloading mod_pocketsphinx FS detected simple german words but not very reliable. I think this is because of the small amount of prepared german audio data. Voxforge recommends 130 hours for training, but currently (March 2011) there are only 25hours available.

See Also