TTS Engine Characteristics

Microsoft Speech SDK

previous page next page

Microsoft Speech SDK SAPI 5.1

TTS Engine Characteristics

Engines use the three characteristics of Volume, Pitch, and Rate to partially define speech traits. At the application level, setting these values is simple; you need only set them to a given number. However, implementation of these traits is more complex for the engine.

Volume

At the application level, volume is a number from zero to 100 where 100 is the maximum value for a voice. It is a linear progression and a value of 50 represents half of the loudest permitted. The increments should be the range divided by 100.

Pitch adjustment

The value can range from -10 to +10. A value of zero sets a voice to speak at its default pitch. A value of -10 sets a voice to speak at three-fourths of its default pitch. A value of +10 sets a voice to speak at four-thirds of its default pitch. Each increment between -10 and +10 is logarithmically distributed such that incrementing or decrementing by 1 is multiplying or dividing the pitch by the 24th root of 2 (about 1.03). Values outside of the -10 and +10 range will be passed to an engine. However, SAPI 5-compliant engines may not support such extremes and may clip the pitch to the maximum or minimum the engine supports. Values of -24 and +24 must lower and raise pitch by 1 octave respectively. All incrementing or decrementing by 1 must multiply or divide the pitch by the 24th root of 2.

Rate adjustment

The value can range from -10 to +10. A value of zero sets a voice to speak at its default rate. A value of -10 sets a voice to speak at one-third of its default rate. A value of +10 sets a voice to speak at three times its default rate. Each increment between -10 and +10 is logarithmically distributed such that incrementing or decrementing by 1 is multiplying or dividing the rate by the 10th root of 3 (about 1.1). Values more extreme than -10 and +10 will be passed to an engine. However, SAPI 5-compliant engines may not support such extremes and may clip the rate to the maximum or minimum rate the engine supports.

previous page start next page