Writing a Speech Recognition Phone Converter for SAPI (Microsoft.Speech)

Microsoft Speech Platform SDK 11

Collapse image Expand Image Copy image CopyHover image

An engine vendor who wishes to build a new speech recognition (SR) engine for a new language, for which there is no specific phone converter supplied, should use the SAPI Universal Phone Set.

The engine vendor then needs to write a language-specific phone converter from SAPI phones to SR engine phones for the new language. This is implemented in the SR engine's locale handler. This phone converter defines the mapping between the SAPI phone set and the internal SR phone set for the particular language, which defines the acoustic models the engine uses. For example, the SR engine may only model broad phonemic distinctions in the language, but the application lexicon may encode a more precise phonetic description (such as by including Diacritics (Microsoft.Speech)). It is up to the engine to decide how to map the SAPI pronunciations.

In addition, there are no restrictions on application developers to use a particular subset of UPS. The phone converter must do additional validity checks to ensure that the SAPI pronunciations use a valid phone subset for the language in question, and also that the phone strings are well formed according to parsing guidelines. For more information, see Parsing Guidelines for SAPI Speech Recognition Phone Converters (Microsoft.Speech).