International Phoneme Representation

Microsoft Speech SDK

The Microsoft.com Speech website Microsoft Speech SDK SAPI 5.1

International Phoneme Representation

You can create pronunciations for words that are not currently in the lexicon using the phonemes represented in the attached appendices. The proposed phoneme set is composed of a symbolic phonetic representation (SYM).

You can enter the SYM representation to create the pronunciation by using the XML PRON tag, or by creating a new lexicon entry. Each phoneme should be space delimited.

The engine is passed a USHORT structure called SPPHONEID (a number between 1 and n where n is the total number of phonemes for that language). The conversion from the SYM to SPPHONEID occurs in the SAPI PhoneConverter.

Mark Up TagDescription
PRON SYMTag used to insert a pronunciation using symbolic representation

Example: pronunciation for "hello"

<PRON SYM = "h eh l ow"/>

For improved accuracy, the primary (1), secondary (2) stress markers, and the syllabic markers (-) can be added to the pronunciation.

Example: pronunciation for "hello" using the primary stress (1) and syllabic (-) markers:

<PRON SYM = "h eh - l ow 1"/>

SAPI-compliant engines are required to accept the PHONEID representation, and produce an articulation. The specific allophonic articulation is defined by the engine. There is no provision for support of phonemes outside the SAPI phoneme set.

Main goals for defining the language dependent phoneme set:
  • Provide an engine-independent architecture for application developers to create user and application lexicons.
  • Make the English phonetic table simple enough to be used and understood by non-linguists who use the American English phoneme set.

International phoneme use

Using the international phoneme schema, you can create a phoneme set which can be used for each language independently. Using the numeric representation as opposed to the International Phonetic Alphabet (IPA) code will eliminate some of the problems regarding the possible differences in the IPA values for the same phonemes. Hence, an 'r' in English will correspond to a certain number (38) and an 'r' in French may correspond to a different number. It is up to the individual engine to provide the exact IPA value for the two 'r's.

Each language will be associated with a set of phonemes numbered from 1 to X. You can use either the symbolic representation or the number representation to enter the pronunciation. Since you are probably not a linguist, the IPA code will probably have little meaning.

Please note that consistent pronunciation is NOT a goal, while predictable pronunciation is. Using the phoneme set, an application developer can guarantee a minimal pronunciation, but not the exact allophonic expression. So, the word "first" will always be pronounced as "first", never as "fist" or "feast", etc, but the accent of the engine may be slightly different due to the fact that the internal allophone values may differ

For more information and definitions for international phoneme sets, please see: