This schema describes the SAPI 5.0 TTS XML grammar format. The SAPI TTS XML schema is included in the TTS XML parser. Hence, it is not necessary to include the schema in the XML file when authoring a grammar. NOTE: This schema is based on the Microsoft schema language and is not fully W3C compliant. This schema will be rewritten and will be compliant with the W3C standard once it has been approved by the W3C. This specifies the type of context. Refer to the SAPI documentation for the vairous context ids. String representing a phoneme for a language supported by the voice implementing synthesizing speech. Refer to SAPI Phoneme Spec. Language identifier. The language identifier is specified as a hexadecimal value. For example, the LANGID for English (US) expressed in the hexadecimal form is 409. This specifies the volume as percent of the maximum volume of the current voice. Each voice implementation has it’s own maximum volume. This value must between 0 and 100 inclusive. Values above 100 or below 0 are clipped to 100 and 0 respectively. The value of a bookmark may be any string or integer. The value can range from –10 to +10. A value of 0 sets a voice to speak at its default pitch. A value of –10 sets a voice to speak at three-fourths (or ¾) of its default pitch. A value of +10 sets a voice to speak at four-thirds (or 4/3) of its default pitch. Each increment between –10 and +10 is logarithmically distributed such that incrementing/decrementing by 1 is multiplying/dividing the pitch by the 24th root of 2 (about 1.03). Values more extreme than –10 and 10 will be passed to an engine but SAPI 5compliant engines may not support such extremes and instead may clip the pitch to the maximum or minimum pitch it supports. Values of –24 and +24 must lower and raise pitch by 1 octave respectively. All incrementing/decrementing by 1 must multiply/divide the pitch by the 24th root of 2. When scoped, this attribute is relative. Number of milliseconds, from zero to 65535, of silence. Value entries that exceed this range should be limited to 65535. Value entries that are below this range (negative values) should be set to zero. The XML parser selects the first voice registered containing all of the specified attributes. A string that contains semicolon-delimited sub-strings is used to specify the attributes. The speak call will fail if the parser cannot find the required tags. The XML parser selects the first voice registered containing all of the specified attributes. A string that contains semicolon-delimited sub-strings is used to specify the attributes. The speak call will fail if the parser cannot find the required tags. The value can range from –10 to +10. A value of 0 sets a voice to speak at its default rate. A value of –10 sets a voice to speak at one-third (or 1/3) of its default rate. A value of +10 sets a voice to speak at 3 times its default rate. Each increment between –10 and +10 is logarithmically distributed such that incrementing/decrementing by 1 is multiplying/dividing the rate by the 10th root of 3 (about 1.12). Values more extreme than –10 and +10 will be passed to an engine, but SAPI 5compliant engines may not support such extremes and instead may clip the rate to the maximum or minimum rate it supports. When scoped, this attribute is relative. String name of part of speech. Valid SAPI parts of speech arenoun, verb, modifier, function, interjection and unknown. The value can range from –10 to +10. A value of 0 sets a voice to speak at its default pitch. A value of –10 sets a voice to speak at three-fourths (or ¾) of its default pitch. A value of +10 sets a voice to speak at four-thirds (or 4/3) of its default pitch. Each increment between –10 and +10 is logarithmically distributed such that incrementing/decrementing by 1 is multiplying/dividing the pitch by the 24th root of 2 (about 1.03). Values more extreme than –10 and 10 will be passed to an engine but SAPI 5compliant engines may not support such extremes and instead may clip the pitch to the maximum or minimum pitch it supports. Values of –24 and +24 must lower and raise pitch by 1 octave respectively. All incrementing/decrementing by 1 must multiply/divide the pitch by the 24th root of 2. When scoped, this attribute is absolute. The value can range from –10 to +10. A value of 0 sets a voice to speak at its default rate. A value of –10 sets a voice to speak at one-third (or 1/3) of its default rate. A value of +10 sets a voice to speak at 3 times its default rate. Each increment between –10 and +10 is logarithmically distributed such that incrementing/decrementing by 1 is multiplying/dividing the rate by the 10th root of 3 (about 1.12). Values more extreme than –10 and +10 will be passed to an engine, but SAPI 5compliant engines may not support such extremes and instead may clip the rate to the maximum or minimum rate it supports. When scoped, this attribute is absolute. At the beginning of the SAPI tag, the state of the voice is the same state as the insertion point of the SAPI tag. At the close of the SAPI tag, the voice returns to the same state as that of the insertion point. SAPI tags may be nested. When a nested SAPI tag is closed, the voice state returns to what it was at the insertion point of the nested tag. Place emphasis on the words contained by this element. It is up to the engine implementation to design what emphasis is for the engine. Spell out words letter by letter contained by this element. NOTE: The engine should not normalize the text scoped in the SPELL tag. This includes numbers, words, etc. Words which contain punctuation, such as “U.S.A” should spell out the letters as well as the punctuation scoped within the tag. String representing a phoneme for a language supported by the voice implementing synthesized speech. 0 to 100 (no overflow allowed) Set the relative pitch adjustment of synthesized speech. Inserts a bookmark into the input stream using the bookmark element. If an application specifies interest in bookmark events, it will receive an event when synthesis has passed this element in an input stream. If the audio output destination supports handling of events, then an application will receive this event once the synthesized speech up to this bookmark has been output. Otherwise, an application receives a bookmark event when the voice implementation has synthesized speech up to this bookmark. Produces silence for a specified number of milliseconds to the output audio stream. Places emphasis on the words contained by this element. Spells out words letter by letter contained by this element. Note: The engine should not normalize the text scoped in the SPELL tag. This includes numbers, words, etc. Words that contain punctuation, such as "U.S.A." should spell out the letters as well as the punctuation scoped within the tag. The part of speech of contained word(s). The PARTOFSP tag is used to force a particular pronunciation of a word (for example, the word record as a noun versus the word record as a verb). Pronounces the contained text (possibly empty) according to the provided Unicode string. Changes the LANGID of the scoped text. When the LANGID is changed, SAPI will try to detect if the current voice can handle the new language. If voice does not speak the specified language, then an engine must choose another language it speaks as a best attempt. Using the VOICE tag and REQUIRED attribute, this fall back path can be prevented if not desirable. Sets which voice implementation is used for synthesis of associated input stream text. The best voice implementation given the required and optional attributes will be selected by SAPI. Set the relative speed adjustment at which words are synthesized. The scoped/global elements VOLUME modify the underlying numerical values of a speech block. The underlying value can never be below zero or exceed 100. All negative value entries will result in zero and all values above 100 will result in 100. VOLUME may also receive an absolute value (no '-' or '+' character) of an integer between zero and 100. The scoped/global element PITCH modifies the underlying numerical values of a speech block. Relative attribute values, those preceded by a dash (-) or a plus sign (+), increment the underlying numerical value by the specified amount. SAPI compliant engines have the option of supporting only the guaranteed range of values and behaving as -10 for adjustments below -10 and behaving as +10 for values above +10. The context can specify the type of normalization rules which should be applied to the scoped text. SAPI does not guarantee any predefined contexts.