0. About
The FreeSWITCH Speech Phrase Management architecture provides a consistent framework for the management of language dependent voice prompting without the need to dig into the applications source code. A single application developed using the framework will work with the current languages implemented or new languages in the future.
1. Features
- Multilingual Support
- No Source Code required to modify prompts
- Ability to select prompts using pattern matching in XML
- Integrated support for voice and TTS in the same application
- Custom phrases can be added at any time
- Switch voice libraries with one setting
- Only load the code for the languages you want to support (less code bloat).
2. Overview
There are several ways to speak prompts in FreeSWITCH, but the Speech Phrase Management sub-system provides the most features and flexibility.
2.1 Language modules
Prompts are defined outside the application and can be modified to suit the specific implementation or language. When amounts, dates, numbers, or letters are enunciated, the proper phrases to assemble and the ordering of those phrases is determined by the mod_say_xx
module (where xx
stands for a two-letter language code, such as en
).
Because different languages assemble the same phrases differently (and even use different words depending upon the type of object being referred to), a helper application is needed to do the job properly. This is the job of the mod_say_xx
(e.g., mod_say_en
, mod_say_fr
) module. Within this module are the necessary functions speak time, money, counts, spell letters, and digits.
In order to support the English version (mod_say_en
), the code expects certain prompt directories to exist in your base voice file path (for example at /var/sounds/freeswitch/en
; see sounds_dir
and sound_prefix
in Global Variables).
Basic sounds should be installed during a vanilla install (at /usr/share/freeswitch/sounds
on Debian 9), but, just in case, here are all the available sounds:
2.2 Configuration
2.2.1 Load language modules
For each language you want to support you will need to load the appropriate mod_say_xx
module in conf/autoload_configs/modules.conf.xml
. (See Modules.)
<load module="mod_say_en"/>
2.3 Specify phrase directories
See 3 Phrase primer section below to read more on phrases.
Also specify the location of language-specific phrase directories for each language in conf/freeswitch.xml
(e.g., "de" for German):
<X-PRE-PROCESS cmd="include" data="lang/de/*.xml"/>
See 6. Configuration files in Default Configuration.
3. Phrase primer
The phrases
section in conf/freeswitch.xml
defines the construction and enunciation of phrases in various languages.
TODO By Tomas Bajarunas:
add optional name attribute for macros:
<macros name="optional_macros_name">
...
</macros>
I think that name later can be used when using phrases like this in dialplan:
<action application="playback" data="phrase:MyPhrase@optional_macros_name" />
from other phrases:
<action function="phrase" phrase="MyPhrase@optional_macro_name" data="some:data" />
3.1 Phrase macros
3.1.1 macros
tag
The following XML snippet illustrates the structure to define phrase macros:
<section name="phrases" description="Speech Phrase Management"> <macros> ... </macros> </section>
All prompts should be defined in this section.
3.1.2 language
tag
The <macros>
section is then sub-divided into languages as follows.
<language name="en" sound_path="/var/sounds/phrases/en" tts_engine="cepstral" tts_voice="david"> <!-- macros --> </language>
Where
name
- Defines the specific language these prompts belong to.
In the above example it isen
, that will cause themod_say_en
module to be used to enunciate any constructed phrases (like money, date, time, etc.)sound_path
- The base path to the voice files for this language.tts_engine
- The text-to-speech engine to use for any TTS spoken.tts_voice
- The specific voice to use for TTS.
See TTS page for available engines and voices.
3.1.3 macro
tag
Within the language there are one or more macros defined:
<macro name="msgcount"> <!-- inputs --> </macro>
3.1.4 input
tag
<macro name="msgcount"> <input pattern="^\d+$"> <!-- match and nomatch tags --> </input> </macro>
pattern
is a PCRE-compatible regular expression to match on the second argument to the phrase
application (i.e., the actual data to speak).
For example, using the example below, the above macro pattern will match "130".
<action application="phrase" data="msgcount,130"/>
Using regexes, you can filter for specific conditions, and even "scrub" the data to ensure it is in the proper layout.
Within macro
all input
patterns will be tested for possible matches, unless the break
action is used.
See 3.2 Phrase macro actions section below.
3.1.4.1 Example
To achieve proper pluralization, you may define multiple input
patterns, and use different prompts for each, such as "You have 2 messages" versus "You have 1 message".
<macro name="msgcount"> <input pattern="^\d+$"> <!-- ... plural prompt ... --> </input> <input pattern="^\d$"> <!-- ... singular prompt ... --> </input> </macro>
3.1.5 match
and nomatch
tags
Within a input
tag there are one or more match
and nomatch
tags.
<macro name="msgcount"> <input pattern="^\d+$"> <match> <!-- actions --> </match> <nomatch> <!-- actions --> </nomatch> </input> </macro>
These define the actions to take if the input pattern is matched (or not matched).
3.1.5.1 Example
<macro name="tts-timeleft"> <input pattern="(\d+):(\d+)"> <match> <!-- Speak the time in the format: --> <action function="speak-text" data="You have $1 minutes, $2 seconds remaining $strftime(%Y-%m-%d)"/> </match> <nomatch> <!-- The input wasn't in the format of 12:34 (or similar), hence: --> <action function="speak-text" data="That input was invalid."/> </nomatch> </input> </macro>
3.1.6 action
tag
Within a match
and nomatch
tag there are one or more actions.
<action function="execute" data="sleep(1000)"/> <action function="play-file" data="vm-youhave.wav"/> <action function="say" data="$1" method="pronounced" type="items"/>
These define the specific actions to take when this macro is applied. It usually consist of calling the say
application, passing the parsed data to be spoken.
The possible actions are described in 3.2 Phrase macro actions section below.
3.2 Phrase macro actions
<action function=[phrase_macro_action] data=[arguments] [other_properties] />
Where phrase_macro_action
can be:
phrase_macro_action | Description |
---|---|
execute | Calls the FreeSWITCH TODO What is the |
play-file | Play a specific audio file or play a macro in the form |
| Use the pre-recorded sound files to read or say various things like dates, times, digits, etc. Requires the |
speak-text | Speak some text using the TTS engine. |
break | Stop parsing any more input patterns. See 3.1.4 |
3. Usage
3.1 From XML Dialplan
3.1.1 Selecting the language
The language to use is selected by setting the default_language
variable (see Channel Variables Catalog) to the specific language code you want.
<!-- select English as the default language --> <action application="set" data="default_language=en"/>
If you specify a specific language to use in the API call (see below methods), it will override the default_language
channel variable setting.
This is to support prompts that should be spoken in a particular language regardless of the users default language selection.
3.1.2 Playing prompts from the dialplan
The phrase
application will call the say API using the phrases defined in the phrases
section of your conf/freeswitch.xml
file.
<action application="phrase" data="msgcount:10"/> <action application="phrase" data="spell-phonetic:abc.012345 6789def#*"/> <action application="phrase" data="spell:${caller_id_name}"/>
The data field passes two parameters:
- phrase macro name to use
The macro names are arbitrary but should be meaningful for documentation purposes. - data (i.e., arguments) to pass to the macro
The data can be a literal as in the first two examples above or a string variable as in the third example.
The playback
application can also be used in same way as "phrase" application.
<action application="set" data="playback_terminators=#"/> <action application="playback" data="phrase:demo_ivr_main_menu"/> <action application="playback" data="phrase:voicemail_message_count:16:new"/>
3.2 Playing prompts from a C application
status = switch_ivr_phrase_macro(session, "phrasename", "phrasedata", language, args);
3.3 Playing prompts from JavaScript application
function sayphrase(phrase, args) { console_log("sayphrase: phrase=[" + phrase + "] args=[" + args + "]\n"); var rtn = session.execute("phrase", phrase + "," + args); return(rtn); } if (session.ready()) { session.answer(); session.execute("sleep","1000"); sayphrase("msgcount", "10"); session.hangup(); }
4. Examples
4.1 Speaking a number
4.2 Calling a macro from within a macro
5. Play as Sound Files
I used the following for German prompts conf/lang/de/de.xml
<include> <language name="de" sound-path="$${base_dir}/sounds/de/de/callie" tts-engine="cepstral" tts-voice="katrin"> <X-PRE-PROCESS cmd="include" data="demo/demo.xml"/> <!--voicemail_de_tts is purely implemented with tts, we need a files based implementation too --> <YX-PRE-PROCESS cmd="include" data="vm/tts.xml"/> <X-PRE-PROCESS cmd="include" data="vm/sounds.xml"/> <!-- vm/tts.xml if you want to use tts and have cepstral --> <X-PRE-PROCESS cmd="include" data="dir/sounds.xml"/> <!-- dir/tts.xml if you want to use tts and have cepstral --> </language> </include>