International Phoneme Representation (Microsoft Speech Platform)

Microsoft Speech Platform SDK 11

Microsoft Speech Platform

International Phoneme Representation

You can create pronunciations for words that are not currently in the lexicon using the phonemes represented in the attached appendices. The proposed phoneme set is composed of a symbolic phonetic representation (SYM).

You can enter the SYM representation to create the pronunciation by using the XML PRON tag, or by creating a new lexicon entry. Each phoneme should be space delimited.

The engine is passed a USHORT structure called SPPHONEID (a number between 1 and n where n is the total number of phonemes for that language). The conversion from the SYM to SPPHONEID occurs in the SAPI PhoneConverter.

Mark Up TagDescription
PRON SYMTag used to insert a pronunciation using symbolic representation

Example: pronunciation for "hello"

<PRON SYM = "h eh l ow"/>

For improved accuracy, the primary (1), secondary (2) stress markers, and the syllabic markers (-) can be added to the pronunciation.

Example: pronunciation for "hello" using the primary stress (1) and syllabic (-) markers:

<PRON SYM = "h eh - l ow 1"/>

SAPI-compliant engines are required to accept the PHONEID representation, and produce an articulation. The specific allophonic articulation is defined by the engine. There is no provision for support of phonemes outside the SAPI phoneme set.

Main goals for defining the language dependent phoneme set:
  • Provide an engine-independent architecture for application developers to create user and application lexicons.
  • Make the English phonetic table simple enough to be used and understood by non-linguists who use the American English phoneme set.

Note on SAPI German Phone Set

The SAPI German phone set collapses two separate German phonemes into one representation. The final -e and -er sounds of a German word are both represented by the symbol "ax", which indicates a schwa. This conflation of the two phonemes can cause problems with words in which the -e and -er phonemes form a minimal pair, such as "koche" (I cook) and "Kocher" (stove). An imperfect workaround is to represent the -er phone using "ax R".

International phoneme use

Using the international phoneme schema, you can create a phoneme set which can be used for each language independently. Using the numeric representation as opposed to the International Phonetic Alphabet (IPA) code will eliminate some of the problems regarding the possible differences in the IPA values for the same phonemes. Hence, an 'r' in English will correspond to a certain number (38) and an 'r' in French may correspond to a different number. It is up to the individual engine to provide the exact IPA value for the two 'r's.

Each language will be associated with a set of phonemes numbered from 1 to X. You can use either the symbolic representation or the number representation to enter the pronunciation.

Please note that consistent pronunciation is NOT a goal, while predictable pronunciation is. Using the phoneme set, an application developer can guarantee a minimal pronunciation, but not the exact allophonic expression. So, the word "first" will always be pronounced as "first", never as "fist" or "feast", etc, but the accent of the engine may be slightly different due to the fact that the internal allophone values may differ

For more information and definitions for international phoneme sets, please see: