Speech Synthesis Markup Language Reference (Microsoft.Speech)

Microsoft Speech Platform SDK 11

previous page next page

Speech Synthesis Markup Language (SSML) is an XML-based markup language that application developers use to control various characteristics of synthetic speech (text-to-speech, or TTS) output including voice, pitch, rate, volume, pronunciation, and other characteristics.

The Microsoft implementation of SSML is based on World Wide Web Consortium Speech Synthesis Markup Language (SSML) Version 1.0.

All SSML elements belong to the ssml namespace. The following elements are implemented in the Microsoft Speech Platform SDK 11.

SSML Element	Description	Usage	Attributes
audio	Supports the insertion of recorded audio files.	Optional	src
break	An empty element used to control the prosodic boundaries between words.	Optional	strength, time
emphasis	Increases the level of stress with which the contained text is spoken.	Optional	level
lexicon	Specifies a lexicon document that contains the pronunciations for the content of the document.	Optional	uri, type
mark	Designates a specific reference point in the text sequence. This element can also be used to mark an output audio stream for asynchronous notification.	Optional	name
p and s	Denote the paragraph and sentence structure of the document.	Optional	xml:lang
phoneme	Indicates the phonetic pronunciation for the contained text. Overrides the pronunciations in the lexicon, if one is specified.	Optional	ph, alphabet
prosody	Controls the pitch, rate, and volume of the speech output.	Optional	pitch, contour, range, rate, duration, volume
say-as	Indicates the type of text contained in the element (such as acronym, number, and date).	Optional	interpret-as, format, detail
speak	The required root element for all SSML documents.	Required	version, xmlns, xml:lang
sub	Specifies a string of text that should be pronounced in place of the text contained in the element.	Optional	alias
voice	Specifies a voice and its attributes, to be used for synthesized speech, often used to change from one voice to another.	Optional	xml:lang, gender, age, variant, name

In This Section

previous page start next page