Microsoft Speech Platform
Grammar Format Tags
The SAPI text grammar format is composed of XML tags, which can be structured to define the phrases that the speech recognition engine recognizes. The following document explains each tag in more detail, including sample source code, sample XML grammar snippets, and relevant application scenarios.
The XML tags descriptions are organized by XML element, where each element description contains information for relevant attributes.
XML Tags: Elements
<DEFINE>
Summary: The DEFINE tag is used for declaring a set of string identifiers for numeric values.
XML Attributes: None
XML Parent Elements: GRAMMAR: The container for the entire XML grammar.
XML Child Elements: ID (1 or more required): The DEFINE tag can contain one or more ID tags, each of which defines one string identifier.
Detailed Description: None
XML Grammar Sample(s): <GRAMMAR> <DEFINE> <ID NAME="TheNumberFive" VAL="5"/> </DEFINE> <!-- Note that the ID takes a number, which is actually "5" --> <RULE ID="TheNumberFive" TOPLEVEL="ACTIVE"> <P>five</P> </RULE> </GRAMMAR>
Programmatic Equivalent: See the ID tag.
<DICTATION>
Summary: The DICTATION tag is used in rules or phrases that need basic dictation support.
XML Attributes: MAX (optional, type=VT_I4, default=MIN): Specifies the maximum number of dictation words that can be recognized. The application must specify a MAX value that is greater than or equal to the MIN value. The application can specify a pseudo-infinite maximum by specifying INF as the MAX. The pseudo-infinite is actually 255 dictation words. An application that needs free-form dictation, such as the subject line of an email should use a large MAX. Alternatively, an application that needs to recognize a person's name may want a much smaller value, such as 5 words. MIN (optional, type=VT_I4, default=1): Specifies the minimum number of dictation words that must be recognized. If the grammar author specifies the MIN value, and the recognizer does not meet the minimum, the rule will fail to be recognized. A Scenario where it may make sense to set a value greater than one would be an application that is asking for a first and last name. PROPID (optional, type=VT_I4): Specifies the semantic property's numeric identifier. PROPNAME (optional): Specifies the semantic property's string identifier.
XML Parent Elements: LIST, L: List of phrases which can be recognized. PHRASE, P: Phrase that must be recognized for the containing rule to be recognized. OPT, O: Optional phrase that may be recognized. RULE: Rule that contains phrases or text to be recognized.
XML Element Children: None.
Detailed Description: The DICTATION tag is designed for applications that need to integrate command & control and dictation support into a CFG. For example, an application may allow the user to speak free-form dictation into a command (e.g. "save document as our family's budget" where "our family's budget is free-form dictation). The application may also create a CFG which supports a set of specific phrases or words, and also includes a single DICTATION tag in case of an unexpected user-phrase. For example, a CFG may include a set of address book names which are known, and if the user speaks another name, then the application prompts the user for validation of the dictated result. Note that the SR engine's accuracy may suffer by mixing dictation and CFG phrases together, since many words sound similar, and a CFG is generally preferred for application development with known words. The grammar author can also use a special character, asterisk (*) instead of the entire XML tag. See XML Grammar Format: Special Dictation Tag. By using semantic properties, the application can easily retrieve the exact text that was dictated by the speaker. To specify a semantic property for the DICTATION tag the grammar author should specify the PROPID and/or PROPNAME attributes. The SAPI run time will automatically set the semantic tag's starting phrase element, allowing the application to search for the specific semantic property in the properties hierarchy (see SPPHRASEPROPERTY.ulFirstElement). If multiple dictation words are recognized by the SR engine (e.g. DICTATION MAX > 1), then the SAPI run time will generate multiple semantic properties, one for each word, where all of the properties will have the same numeric ID and/or string NAME. If the speech recognition engine supports multiple dictation topics (e.g. spelling, general, legal, medical, etc.), the DICTATION tag in the grammar will refer to topic that was selected when ISpRecoGrammar::LoadDictation was called. If the topic was not explicitly selected, then the default SR engine dictation topic will be loaded. Currently, it is not possible to load multiple dictation topics inside of a single command & control grammars. Application should create multiple grammar objects to implement the latter scenario. If there is ambiguity between a dictation phrase and a CFG phrase, the speech recognition engine will typically choose the CFG phrase. Preferring CFGs over dictation prevents dictation from automatically consuming all CFG phrases. The speech recognition engine must support dictation inside of a CFG for the grammar to load and activate successfully. The application can determine if an engine supports the DICTATION tag by retrieving the SR engine's object token (see ISpRecognizer::GetRecognizer), and then checking for the existence of the engine attribute "DictationInCFG" (see ISpObjectToken::MatchesAttributes). The engine can specify support for the DICTATION tag to be anywhere in the CFG phrase (attribute value="Anywhere"), or only at the end (attribute value="Trailing").
XML Grammar Sample(s): <GRAMMAR> <!-- basic command to create a self-note for the user with free-form text --> <RULE ID="SelfNote" TOPLEVEL="ACTIVE"> <P>note to self</P> <DICTATION MAX="INF"/> </RULE> <!-- command to query a name from an address book --> <RULE ID="QueryName" TOPLEVEL="ACTIVE"> <P>list first names of all persons with last name</P> <!-- Store only one word for the last name, more will fail command --> <DICTATION MAX="1"> </RULE> <!-- command to handle first and last names with semantic properties --> <!-- By using semantic properties, the application can ignore all of the text returned, except for the text associated with the dictation tags' semantic properties "PID_FirstName" and "PID_LastName" --> <RULE ID="SubmitName" TOPLEVEL="ACTIVE"> <P> my first name is <!-- Note the implicit maximum is only one word --> <DICTATION PROPID="PID_FirstName"/> and my last name is <!-- Note the implicit maximum is two words --> <DICTATION PROPID="PID_LastName" MAX="2"/> </P> </RULE> </GRAMMAR>
Programmatic Equivalent: To programmatically create a dictation transition (i.e. DICTATION tag) in a CFG, the application developer can use the ISpGrammarBuilder::AddRuleTransition with a special rule handle, called SPRULETRANS_DICTATION. For example, the following code creates a simple command called "SendMail" which recognizes the command "send mail to DICTATION".
SPSTATEHANDLE hsSendMail; // Create new top-level rule called "SendMail" hr = cpRecoGrammar->GetRule(L"SendMail", NULL, SPRAF_TopLevel | SPRAF_Active, TRUE, &hsSendMail); // Check hr // Create an interim state before the dictation transition SPSTATEHANDLE hsBeforeDictation; hr = cpRecoGrammar->CreateNewState(hsSendMail, &hsBeforeDictation); // Check hr // Add the command words "send mail to" hr = cpRecoGrammar->AddWordTransition(hsSendMail, hsBeforeDictation, L"send mail to", L" ", SPWT_LEXICAL, 1.0f, NULL); // Check hr // Add trailing dictation transition hr = cpRecoGrammar->AddRuleTransition(hsBeforeDictation, NULL, SPRULETRANS_DICTATION, NULL, NULL); // Check hr // save/commit changes hr = cpRecoGrammar->Commit(NULL); // Check hr
Note that the previous sample code only supports one dictation word. To support more than one word, the code would need to build more dictation transition states, each of which begins at the previous dictation state - effectively, a series of consecutive single-word dictation transitions.
<GRAMMAR>
Summary: The GRAMMAR tag is the outermost container for the XML grammar definition.
XML Attributes: LANGID (optional, type=numeric): The language identifier of the grammar. The identifier will be compared against the the supported languages of the Speech Recognition engine. If the language is not supported, the grammar load call will fail (e.g. ISpRecoGrammar::LoadCmdFromFile). It is recommended that all XML grammars include the LANGID attribute to avoid the scenario where the SR engine tries to load a grammar with an unspecified language ID, and fails due to confusing words. SAPI supports fuzzy language ID matching, in that the SR engine can report that is supports the major portion of the Language ID (e.g. 0x009 in 0x409), which means the SR engine will try to load and recognize any grammar that matches the major portion of the language ID. LEXDELIMITER (optional): The LEXDELIMITER attribute specifies the delimiter for explicit lexicon entries specified in the grammar. Grammar authors are able to specify the lexicon information by using a special sequence of characters. The sequence of characters is: LEXDELIMITERDisplayFormLEXDELIMITERLexicalFormLEXDELIMITERPronunciation; The default delimiter is the backslash character "/". See also PHRASE. WORDTYPE (optional): The WORDTYPE attribute specifies the type of the word(s) when they are added to the grammar. The default value is "LEXICAL". The value must be "LEXICAL".
XML Parent Elements: None
XML Child Elements: DEFINE (optional): Specifies the constant definitions for the grammar. RULE (1 or more required): Specifies the rules, including top-level and non-top-level.
Detailed Description: Every XML grammar must have the container tag, GRAMMAR.
XML Grammar Sample(s): <!-- Language ID = British English --> <GRAMMAR LANGID="413" LEXDELIMITER="|" WORDTYPE="LEXICAL"> <RULE NAME="HelloWorld" TOPLEVEL="ACTIVE"> <!-- when the user says the following pronunciation, "Hiya" will be displayed --> <P>|Hiya|Hello|h eh l ow;</P> </RULE> </GRAMMAR>
Programmatic Equivalent: To programmatically set the language ID of a new grammar, the application developer should call ISpGrammarBuilder::ResetGrammar. The application developer does not need to change the LEXDELIMITER or the WORDTYPE, since the ISpLexicon interface can be used to modify the lexicon.
<ID>
Summary: The ID tag is used for declaring a string identifier for numeric values.
XML Attributes: NAME (required): The NAME attribute defines the string identifier that will be associated with the constant value. VAL (required, type=VT_UI4,VT_I4,VT_R4,VT_R8): The VAL attribute defines the constant value that will be associated with the string identifier.
XML Parent Elements: DEFINE: The container for the constant definitions.
XML Child Elements: None
Detailed Description: The ID tag should be used by grammar author to make the grammar easier to read and maintain. The grammar author can use string identifiers which succinctly explain the use of the identifier (e.g. RID_FileNew, PVAL_MAIN_WINDOW, etc.). The grammar compiler stores the identifiers in the binary format, and string identifiers are typically much larger than numeric identifiers. Also, the application developer can use a simple numeric comparison to handle rule and semantic property logic, rather than performing a more complex string comparison.
XML Grammar Sample(s):
<GRAMMAR> <DEFINE> <ID NAME="RuleId_A" VAL="1"/> <ID NAME="PropId_B" VAL="2"/> <ID NAME="PropVal_AB" VAL="3"/> </DEFINE> <!-- Note that Rule ID, Phrase PROPID and VAL take a numeric values. --> <RULE ID="RuleId_A" TOPLEVEL="ACTIVE"> <P PROPID="PropId_B" VAL="PropVal_AB">five</P> </RULE> </GRAMMAR>
Programmatic Equivalent: The Grammar Compiler that ships in the Microsoft Speech SDK includes a command line argument to generate a C-style header (see "-h"), which includes the programmatic constant definitions for all of the IDs defined in the XML grammar. The application developer can include the header file and easily use the same identifiers inside the application logic, without needing to redefine and maintain the numeric values. The XML Grammar Sample above would create the following C-style header file: #define RuleId_A 1 #define PropId_B 2 #define PropVal_AB 3
<LIST>, <L>
Summary: The LIST tag is used for specifying a list of phrases or transitions.
XML Attributes: PROPID (optional, type=VT_I4): The numeric identifier that will be inherited by all semantic properties in the child elements (e.g. phrases). PROPNAME (optional): The string identifier that will be inherited by all semantic properties in the child elements (e.g. phrases).
XML Parent Elements: LIST, L: List of phrases or rules which can be recognized. PHRASE, P: Phrase that must be recognized for the containing rule to be recognized. OPT, O: Optional phrase causing the rule reference to be implicitly optional. RULE: Rule that contains phrases or text to be recognized.
XML Child Elements: RULEREF: Import, or reference, another rules contents PHRASE, P: Specifies text or leaf nodes. LIST, L: Specifies a list of phrases or transitions for recognition. TEXTBUFFER: Specifies a reference to the run-time application maintained text-buffer. WILDCARD: Specifies a garbage word; one or more non-silence, ignorable words DICTATION: Specifies a piece of text recognized by the loaded dictation topic.
Detailed Description: The LIST tag is a quick and efficient way to support lists of phrases or text. Instead of creating separate rules for each piece of text, the LIST tag can be used where its children are the phrase, rule reference, or other tags. The grammar author can use the shorthand version of the LIST tag, the L tag. The LIST tag is more of a virtual tag, since it does not affect the semantic property hierarchy (LIST children are not child properties). While it allows the grammar author to specify a string or numeric identifier, the identifier is only used to pass on to the child element as a default property identifier.
XML Grammar Sample(s): <GRAMMAR> <!-- Note that rule is not top-level and is only used as a reusable component rule --> <RULE NAME="Numbers"> <!-- The list tag includes a semantic property Id, "PID_Value" which is inherited by all child phrase elements --> <LIST PROPID="PID_Value"> <!-- If the user says "one" then the semantic property returned will be the name/value pair "PID_Value"/"1" --> <P VAL="1">one</P> <P VAL="2">two</P> <P VAL="3">three</P> <P VAL="4">four</P> <P VAL="5">five</P> </LIST> </RULE> <!-- The rule contains a list of various types of transitions --> <RULE NAME="Sampler" TOPLEVEL="ACTIVE"> <!-- the list property specifies a default property name of "TYPE_NUMBER", which will overridden by specific list children --> <LIST PROPNAME="TYPE_NUMBER"> <P VAL="1">one</P> <P VAL="2">two</P> <P VAL="3">three</P> <P PROPNAME="TYPE_STRING" VALSTR="FOUR">four</P> <P PROPNAME="TYPE_NONE">five</P> <RULEREF NAME="Numbers" PROPNAME="TYPE_RULEREF"/> <TEXTBUFFER PROPNAME="TYPE_TEXTBUFFER"/> <DICTATION PROPNAME="TYPE_DICTATION"/> </LIST> </RULE> </GRAMMAR>
Programmatic Equivalent: To programmatically create a list, or a set of sibling/parallel transitions, the application needs to create a start state, then create multiple transitions out of the state. For example, the following sample code shows how to make a list of phrases (e.g. "one", "two", "three"). SPSTATEHANDLE hsList; // Create new top-level rule called "List" hr = cpRecoGrammar->GetRule(L"List", NULL, SPRAF_TopLevel | SPRAF_Active, TRUE, &hsList); // Check hr // Add the word "one" to the list hr = cpRecoGrammar->AddWordTransition(hsList, NULL, L"one", L" ", SPWT_LEXICAL, 1.0f, NULL); // Check hr // Add the word "two" to the list hr = cpRecoGrammar->AddWordTransition(hsList, NULL, L"two", L" ", SPWT_LEXICAL, 1.0f, NULL); // Check hr // Add the word "three" to the list hr = cpRecoGrammar->AddWordTransition(hsList, NULL, L"three", L" ", SPWT_LEXICAL, 1.0f, NULL); // Check hr // save/commit changes hr = cpRecoGrammar->Commit(NULL); // Check hr The application developer can use similar code to create a list of rule references, dictation, or text buffer transitions. To change the type of list item, change the ::AddWordTransition call to ::AddRuleTransition.
<OPT>, <O>
Summary: The OPT tag is used for specifying optional text in a command phrase.
XML Attributes: DISP (optional): Specifies the display form of the phrase text. MAX (optional, type=VT_I4, default=MIN): Specifies the maximum number of times the user can repeat the phrase and still be successfully recognized. MIN (optional, type=VT_I4, default=1): Specifies the minimum number of times the user must repeat the phrase and still be successfully recognized. PRON (optional): Specifies the pronunciation to be used by the recognizer when listening for the text. PROPID (optional, type=VT_I4): Specifies the numeric identifier to associate with the phrase tag's semantic property. PROPNAME (optional): Specifies the string identifier to associate with the phrase tag's semantic property. VAL (optional, type=VT_I4): Specifies the semantic property's numeric value. VALSTR (optional): Specifies the semantic property's string value. WEIGHT (type=VT_UI4,VT_I4,VT_R4,VT_R8, default=1/n_sibling_transitions): The probability that the user will speak the contents of the PHRASE tag, versus another sibling transition or phrase.
XML Parent Elements: RULEREF: Import, or reference, another rules contents PHRASE, P: Specifies text or leaf nodes. OPT, O: Optional phrase causing the rule reference to be implicitly optional. LIST, L: Specifies a list of phrases or transitions for recognition. TEXTBUFFER: Specifies a reference to the run-time application maintained text-buffer. WILDCARD: Specifies a garbage word; one or more non-silence, ignorable words DICTATION: Specifies a piece of text recognized by the loaded dictation topic.
XML Child Elements: RULEREF: Import, or reference, another rules contents PHRASE, P: Specifies text or leaf nodes. OPT, O: Optional phrase causing the rule reference to be implicitly optional. LIST, L: Specifies a list of phrases or transitions for recognition. TEXTBUFFER: Specifies a reference to the run-time application maintained text-buffer. WILDCARD: Specifies a garbage word; one or more non-silence, ignorable words DICTATION: Specifies a piece of text recognized by the loaded dictation topic.
Detailed Description: The OPT tag along with the OPT tag are the only tags that can directly contain recognizable text. The grammar author can use the shorthand version of the OPT tag, the O tag. The grammar author can also specify custom word pronunciations and display text by using the PRON and DISP attributes. For example, a grammar might contain application or domain specific text, which has a custom pronunciation. The author can specify the pronunciation on a specific OPT tag to avoid the need for updating the user or application lexicon (especially if the pronunciation is command specific). The grammar author can also use special shorthand characters inside of the content section of the PHRASE tag (e.g. dictation, wildcard, etc.). See the XML Special Characters.
XML Grammar Sample(s): <GRAMMAR> <!-- Create a simple "hello world" rule --> <!-- the second word is optional --> <RULE NAME="HelloWorld" TOPLEVEL="ACTIVE"> <P>hello</P> <OPT>world</OPT> </RULE> <!-- Create a rule that changes the pronunciation and the display form of the phrase. When the user says "eh" the display text will be "I don't understand?". Note the user didn't say "huh". The pronunciation for "what" is specific to this phrase tag and is not changed for the user or application lexicon, or even other instances of "what" in the grammar --> <RULE NAME="Question_Pron" TOPLEVEL="ACTIVE"> <P DISP="I don't understand" PRON="eh">what</P> </RULE> <!-- Create a phrase with an attached semantic property --> <!-- Speaking "one two three" will return three different unique semantic properties, with different names, and different values --> <!-- Speaking "one three" will return two different unique semantic properties, with different names, and different values --> <!-- Speaking "one two" will return two different unique semantic properties, with different names, and different values --> <!-- Speaking "one" will return two different unique semantic properties, with different names, and different values --> <!-- Note that the number of semantic properties returned is variable, and that the application should be designed to handle all of the variations --> <RULE NAME="UseProps" TOPLEVEL="ACTIVE"> <!-- named property, without value --> <P PROPNAME="NOVALUE">one</P> <!-- named property, with numeric value --> <O PROPNAME="NUMBER" VAL="2">two</O> <!-- named property, with string value --> <O PROPNAME="STRING" VALSTR="three">three</O> </RULE> <!-- Create a rule for optional command prefix --> <!-- Note that entire rule reference is optional. In cases where there are properties associated with the rule reference, the semantic property tree may change --> <!-- the rule supports the phrases "play cards", "please play cards", and "please play cards" --> <RULE NAME="PlayCard" TOPLEVEL="ACTIVE"> <O><RULEREF NAME="PLEASE"/></O> <P>play cards</P> </RULE> <!-- The first word "pretty" is optional, while the second is required --> <RULE NAME="PLEASE"> <O>pretty</O> <P>please</P> </RULE> </GRAMMAR>
Programmatic Equivalent: To add an optional phrase to a rule, SAPI provides an API called ISpGrammarBuilder::AddWordTransition. The application developer can add the optional structure as follows: SPSTATEHANDLE hsHelloWorld; // Create new top-level rule called "HelloWorld" hr = cpRecoGrammar->GetRule(L"HelloWorld", NULL, SPRAF_TopLevel | SPRAF_Active, TRUE, &hsHelloWorld); // Check hr // create an interim state SPSTATEHANDLE hInterim; hr = cpRecoGrammar->CreateNewState(hsHelloWorld, &hInterim); // Check hr // Add the command word "hello" which terminates at the interim // state hr = cpRecoGrammar->AddWordTransition(hsHelloWorld, hInterim, L"hello", NULL, SPWT_LEXICAL, 1.0f, NULL); // Check hr // Add the optional command word "world" hr = cpRecoGrammar->AddWordTransition(hInterim, NULL, L"hello", NULL, SPWT_LEXICAL, 1.0f, NULL); // Check hr // Add the epsilon transition, which means no word need be spoken hr = cpRecoGrammar->AddWordTransition(hInterim, NULL, NULL, NULL, SPWT_LEXICAL, 1.0f, NULL); // Check hr // save/commit changes hr = cpRecoGrammar->Commit(NULL); // Check hr
<PHRASE>, <P>
Summary: The PHRASE tag and the OPT tags are the sole methods of explicitly specifying text to be recognized by the speech recognition engine.
XML Attributes: DISP (optional): Specifies the display form of the phrase text. MAX (optional, type=VT_I4, default=MIN): Specifies the maximum number of times the user can repeat the phrase and still be successfully recognized. MIN (optional, type=VT_I4, default=1): Specifies the minimum number of times the user must repeat the phrase and still be successfully recognized. PRON (optional): Specifies the pronunciation to be used by the recognizer when listening for the text. PROPID (optional, type=VT_I4): Specifies the numeric identifier to associate with the phrase tag's semantic property. PROPNAME (optional): Specifies the string identifier to associate with the phrase tag's semantic property. VAL (optional, type=VT_I4): Specifies the semantic property's numeric value. VALSTR (optional): Specifies the semantic property's string value. WEIGHT (type=VT_UI4,VT_I4,VT_R4,VT_R8, default=1/n_sibling_transitions): The probability that the user will speak the contents of the PHRASE tag, versus another sibling transition or phrase.
XML Parent Elements: RULEREF: Import, or reference, another rules contents PHRASE, P: Specifies text or leaf nodes. OPT, O: Optional phrase causing the rule reference to be implicitly optional. LIST, L: Specifies a list of phrases or transitions for recognition. TEXTBUFFER: Specifies a reference to the run-time application maintained text-buffer. WILDCARD: Specifies a garbage word; one or more non-silence, ignorable words DICTATION: Specifies a piece of text recognized by the loaded dictation topic.
XML Child Elements: RULEREF: Import, or reference, another rules contents PHRASE, P: Specifies text or leaf nodes. OPT, O: Optional phrase causing the rule reference to be implicitly optional. LIST, L: Specifies a list of phrases or transitions for recognition. TEXTBUFFER: Specifies a reference to the run-time application maintained text-buffer. WILDCARD: Specifies a garbage word; one or more non-silence, ignorable words DICTATION: Specifies a piece of text recognized by the loaded dictation topic.
Detailed Description: The PHRASE tag along with the OPT tag are the only tags that can directly contain recognizable text. Except for grammars that contain rule references, every grammar must have at least one PHRASE tag. The grammar author can use the shorthand version of the PHRASE tag, the P tag. The grammar author can also specify custom word pronunciations and display text by using the PRON and DISP attributes. For example, a grammar might contain application or domain specific text, which has a custom pronunciation. The author can specify the pronunciation on a specific PHRASE tag to avoid the need for updating the user or application lexicon (especially if the pronunciation is command specific). The grammar author can also use special shorthand characters inside of the content section of the PHRASE tag (e.g. dictation, wildcard, etc.). See the XML Special Characters.
XML Grammar Sample(s): <GRAMMAR> <!-- Create a simple "hello world" rule --> <RULE NAME="HelloWorld" TOPLEVEL="ACTIVE"> <P>hello world</P> </RULE> <!-- Create a more advanced "hello world" rule that changes the display form. When the user says "hello world" the display text will be "Hiya there!" --> <RULE NAME="HelloWorld_Disp" TOPLEVEL="ACTIVE"> <P DISP="Hiya there!">hello world</P> </RULE> <!-- Create a rule that changes the pronunciation and the display form of the phrase. When the user says "eh" the display text will be "I don't understand?". Note the user didn't say "huh". The pronunciation for "what" is specific to this phrase tag and is not changed for the user or application lexicon, or even other instances of "what" in the grammar --> <RULE NAME="Question_Pron" TOPLEVEL="ACTIVE"> <P DISP="I don't understand" PRON="eh">what</P> </RULE> <!-- Create a rule demonstrating repetition --> <!-- the rule will only be recognized if the user says "hey diddle diddle" --> <RULE NAME="NurseryRhyme" TOPLEVEL="ACTIVE"> <P>hey</P> <P MIN="2" MAX="2">diddle</P> </RULE> <!-- Create a list with variable phrase weights --> <!-- If the user says similar phrases, the recognizer will use the weights to pick a match --> <RULE NAME="UseWeights" TOPLEVEL="ACTIVE"> <LIST> <!-- Note the higher likelihood that the user is expected to say "recognizer speech" --> <P WEIGHT=".95">recognize speech</P> <P WEIGHT=".05">wreck a nice beach</P> </LIST> </RULE> <!-- Create a phrase with an attached semantic property --> <!-- Speaking "one two three" will return three different unique semantic properties, with different names, and different values --> <RULE NAME="UseProps" TOPLEVEL="ACTIVE"> <!-- named property, without value --> <P PROPNAME="NOVALUE">one</P> <!-- named property, with numeric value --> <P PROPNAME="NUMBER" VAL="2">two</P> <!-- named property, with string value --> <P PROPNAME="STRING" VALSTR="three">three</P> </RULE> </GRAMMAR>
Programmatic Equivalent: To add a phrase to a rule, SAPI provides an API called ISpGrammarBuilder::AddWordTransition. The application developer can add the sentences as follows: SPSTATEHANDLE hsHelloWorld; // Create new top-level rule called "HelloWorld" hr = cpRecoGrammar->GetRule(L"HelloWorld", NULL, SPRAF_TopLevel | SPRAF_Active, TRUE, &hsHelloWorld); // Check hr // Add the command words "hello world" // Note that the lexical delimiter is " ", a space character. // By using a space delimiter, the entire phrase can be added // in one method call hr = cpRecoGrammar->AddWordTransition(hsHelloWorld, NULL, L"hello world", L" ", SPWT_LEXICAL, 1.0f, NULL); // Check hr // Add the command words "hiya there" // Note that the lexical delimiter is "|", a pipe character. // By using a pipe delimiter, the entire phrase can be added // in one method call hr = cpRecoGrammar->AddWordTransition(hsHelloWorld, NULL, L"hiya|there", L"|", SPWT_LEXICAL, 1.0f, NULL); // Check hr // save/commit changes hr = cpRecoGrammar->Commit(NULL); // Check hr
<RESOURCE>
Summary: The RESOURCE tag is used by grammar authors who want to store arbitrary string data on rules (e.g. for use by a CFG, or an SR engine aware of the the resources).
XML Attributes: NAME: specifies the name of the resource to attach to the rule.
XML Parent Elements: RULE: The rule that contains the resource reference.
XML Child Elements: [CDATA] (required): The resource value is specified by a CDATA section. For example, <![CDATA[This is a test string]]> The RESOURCE tag contains the CDATA element, which itself contains the string.
Detailed Description: The RESOURCE tag is a facility allowing the grammar author to communicate information [attached to rules] to a CFG Interpreter or a speech recognition engine that is aware of the resource information.
XML Grammar Sample(s): <GRAMMAR> <!-- Note resource value can be any string --> <RULE ID="RID_TestResource" TOPLEVEL="ACTIVE"> <RESOURCE NAME="AResource"> <![CDATA[AResource's Value: String]]> </RESOURCE> <P>test an embedded resource</P> </RULE> </GRAMMAR>
Programmatic Equivalent: To add a resource to a rule, SAPI provides an API called ISpGrammarBuilder::AddResource. The application developer can add the aforementioned resource (see XML Grammar Sample) with the following code: SPSTATEHANDLE hsTestResource; // Create new top-level rule called "TestResource" hr = cpRecoGrammar->GetRule(NULL, RID_TestResource, SPRAF_TopLevel | SPRAF_Active, TRUE, &hsTestResource); // Check hr // Add the command words "test an embedded resource" hr = cpRecoGrammar->AddWordTransition(hsTestResource, NULL, L"test an embedded resource", L" ", SPWT_LEXICAL, 1.0f, NULL); // Check hr // Add the resource named "AResource" hr = cpRecoGrammar->AddResource(hsTestResource, L"AResource", L"AResource's Value: String"); // Check hr // save/commit changes hr = cpRecoGrammar->Commit(NULL); // Check hr Then, the SR-Engine can retrieve the resource value when it is processing the rule updates or CFG-recognition by making the following call: // set hRule to handle with resource hr = cpSREngineSite->GetResource(hRule, L"AResource", &pwszResValue); if (S_OK == hr) { // pwszResValue contains the value // perform value-sensitive processing // release value memory ::CoTaskMemFree(pwszResValue); }
<RULE>
Summary: The RULE tag is the core tag for defining which commands are available for recognition. Every grammar must have at least one top-level rule, and every rule must have at least one rule reference or recognizable text.
XML Attributes: DYNAMIC (optional, default is FALSE): Specifies whether the rule supports dynamic modifications at run time. By default, an application cannot modify rules in an XML grammar. To modify a rule, the rule must be marked DYNAMIC, and the grammar must be loaded with the dynamic flag (see ISpRecoGrammar and SPLOADOPTIONS). Dynamic rules cannot be marked EXPORT. EXPORT (optional, default is FALSE): Specifies whether the rule allows external grammar to reference it. For example, a grammar author that wants to allow other grammar author's to reuse her rules must mark each of the reusable rules with EXPORT="TRUE"). Exported rules cannot be marked DYNAMIC. ID (required, type=VT_I4): Specifies the numeric identifier of the rule. The ID or the NAME must be specified, or both. The identifier must be unique in the rule namespace, which is the entire grammar (see GRAMMAR). INTERPRETER (optional, default is FALSE): Specifies if the rule should use the CFG interpreter when it is recognized. For example, a rule might contain semantic properties or text that should be modified at run time (e.g. replace value of the semantic property named "TODAY" with the system's current date and time). NAME (required): Specifies the string identifier of the rule. The NAME or the ID must be specified, or both. The identifier must be unique in the rule namespace, which is the entire grammar (see GRAMMAR). TOPLEVEL (optional): Specifies that the rule is directly recognizable by a user. If the TOPLEVEL tag is not specified, then the rule is not recognizable unless it is referenced by another top-level rule structure. For example, component rules (see RULEREF) do not need to specify the TOPLEVEL attribute. When a grammar author specifies a rule as TOPLEVEL, she must also specify if the rule is to be enabled by default. If the rule is enabled by default (e.g. TOPLEVEL="ACTIVE"), then when the application activates the default set of rules (e.g. ISpRecoGrammar::SetRuleState(NULL, NULL, SPRS_ACTIVE)), then the rule will be activated. If a rule is specified as TOPLEVEL="INACTIVE", then it will only be activated when explicitly set to active (see ISpRecoGrammar::SetRuleState and ISpRecoGrammar::SetRuleIdState).
XML Parent Elements: GRAMMAR: The container for the entire XML grammar.
XML Child Elements: RULEREF: Import, or reference, another rules contents PHRASE, P: Specifies text or leaf nodes. LIST, L: Specifies a list of phrases for recognition. OPT, O: Specifies an optional piece of text that can be spoken. TEXTBUFFER: Specifies a reference to the run-time application maintained text-buffer. WILDCARD: Specifies a garbage word; one or more non-silence, ignorable words DICTATION: Specifies a piece of text recognized by the loaded dictation topic. RESOURCE: Specifies a labeled piece of arbitrary string data which can be accessed by a special SR engine, or a CFG interpreter.
Detailed Description: The RULE tag is the core of the XML grammar text format. The purpose of creating a CFG is to define a specific set of words and phrases that can be spoken by the user and recognized by the speech recognition engine. The rules can be written by the grammar author in a way that makes them reusable, textually maintainable, and conducive to application logic that is based on semantic properties or actions (not on phrase text). Each rule must contain at least one piece of text, or a rule reference (which has the same requirements). Effectively, every rule will eventually end with a piece of text (i.e. leaf or terminal node). The rule can be identified by either a numeric identifier (ID) or a string identifier (NAME). The grammar author can use the DEFINE tag to define constant string identifiers for numeric values. By using the constant string identifiers, the grammar author can avoid magic numbers (i.e. hard-coded numbers that can cause maintenance problems when updating code/grammar). See the ID tag for more information on constant identifiers. By using rule importing (references) and rule exporting, grammar authors can leverage reusable grammar components (e.g. numbers or date grammars). Similarly, grammar authors can abstract certain portions of the grammar text away from the semantic content by using semantic properties, or tags. Semantic properties are name/value pairs which are associated with rule nodes in the rule hierarchy, and can even contain relevant information from the recognized text (see SPPHRASEPROPERTY.ulStartingElement and SPPHRASEPROPERTY.ulCountOfElements). The grammar author can also use a CFG interpreter, which is a COM object that can re-process the semantic property tree and phrase text to modify the content at run time. For example, an application may load a grammar which includes a "days of the week" rule. By integrating a CFG interpreter with the grammar, the interpreter could replace the "days of the week" properties (e.g. Sunday, Monday, Tuesday, etc.) with the actual calendar dates relative to the application's host system (e.g. GetSystemTime). SAPI supports a feature called "semantic property pushing" which enables applications to detect the semantic property structure more accurately at recognition time. "Property pushing" is done by SAPI at compile- time, whereby the compiler moves semantic properties to the last terminal node within a rule which remains unambiguous. For example, the phrases "a b c d" and "a b e f g" both have prefixes of "a b". The compiler will automatically split the phrases into three separate phrases, "a b", "c d", and "e f g", where the first phrase is the common prefix to both recognizable phrases. The purpose of this feature is to enable applications that place properties on the phrases, will be able to detect which branch is being hypothesized as soon as the first unambiguous (non-common) portion of the phrase is spoken. When the user speaks "a b" it is not clear if the user will say "a b c d" or "a b e f g". If the user then says "e", the application can obviously eliminate the "a b c d" option. If the grammar author attached properties to the end of both phrases, the semantic property would be returned as soon as the user spoke the first unambiguous portion of the text (e.g. "c" or "e"). See Semantic Properties, Hypotheses, and "Property Pushing."
XML Grammar Sample(s): <GRAMMAR> <DEFINE> <ID NAME="RID_Hello" VAL="1"/> <ID NAME="RID_World" VAL="2"/> <ID NAME="RID_AddNumbers" VAL="3"/> <ID NAME="RID_Numbers" VAL="4"/> <ID NAME="RID_Numbers_Exportable" VAL="5"/> <ID NAME="RID_Names" VAL="6"/> </DEFINE> <!-- create a simple top-level rule that uses a constant defined identifier --> <RULE ID="RID_Hello" TOPLEVEL="ACTIVE"> <P>hello</P> </RULE> <!-- Create a simple top-level rule that is inactive by default --> <RULE NAME="Hiya" TOPLEVEL="INACTIVE"> <P>hiya</P> </RULE> <!-- Create a rule, which a CFG-interpreter can re-process to modify the semantic properties --> <RULE NAME="InterpretedRule" TOPLEVEL="ACTIVE" INTERPRETER="TRUE"> <P PROPNAME="TODAY">what is today's date</P> </RULE> <!-- Create a simple top-level rule that references another non top-level rule --> <RULE ID="RID_AddNumbers" TOPLEVEL="ACTIVE"> <P>add</P> <RULEREF REFID="RID_Numbers"/> <P>to</P> <RULEREF REFID="RID_Numbers"/> </RULE> <!-- Note that rule is not top-level and is only used as a reusable component rule --> <RULE ID="RID_Numbers"> <LIST PROPID="PID_Value"> <P VAL="1">one</P> <P VAL="2">two</P> <P VAL="3">three</P> <P VAL="4">four</P> <P VAL="5">five</P> </LIST> </RULE> <!-- mark the rule as dynamic so the application can update the list of names at runtime --> <RULE ID="RID_Names" DYNAMIC="TRUE"> <LIST> <P>bob</P> <P>jane</P> <P>kate</P> <P>tom</P> </LIST> </RULE> <!-- Mark the rule as exportable, so other external grammars can access it --> <RULE ID="RID_Numbers_Exportable" EXPORT="TRUE"> <LIST PROPID="PID_Value"> <P VAL="6">six</P> <P VAL="7">seven</P> <P VAL="8">eight</P> <P VAL="9">nine</P> <P VAL="10">ten</P> </LIST> </RULE> </GRAMMAR>
Programmatic Equivalent: Application developers can programmatically add rules to a grammar by using the ISpGrammarBuilder interface inherited by ISpRecoGrammar. The following sample code shows how to add a rule to a grammar. To choose the rule attributes, see the ISpGrammarBuilder::GetRule method and SPCFGRULEATTRIBUTES. SPSTATEHANDLE hHelloWorld; // Create new rule called "HelloWorld" // Note that the second parameter is the ID, which can also be specified // Note also that the rule is marked as top-level and active hr = cpRecoGrammar->GetRule(L"SpeakNumber", NULL, SPRAF_TopLevel | SPRAF_Active, TRUE, &hHelloWorld); // Check hr // add the text "hello world" hr = cpRecoGrammar->AddWordTransition(hHelloWorld, NULL, L"hello world", L" ", SPWT_LEXICAL, 1.0f, NULL); // Check hr // save the grammar changes hr = cpRecoGrammar->Commit(NULL); // Check hr The following sample code shows how to modify a rule in an existing grammar. Specifically, the code will update the list of names rule shown in the XML Sample Grammar section. By updating the names rule, all rules that reference the names will automatically be able to recognize the updated names (after calling ::Commit). SPSTATEHANDLE hNames; // Get a handle to the existing rule // Note the use of the constant identifier RID_Names, which was defined in the // XML sample. See the ID tag for information on generating a C-style header hr = cpRecoGrammar->GetRule(NULL, RID_Names, NULL, TRUE, &hNames); // Check hr // clear the rule to update the entire list hr = cpRecoGrammar->ClearRule(hNames); // Check hr // add name "sally" hr = cpRecoGrammar->AddWordTransition(hNames, NULL, L"sally", NULL, SPWT_LEXICAL, 1.0f, NULL); // Check hr // add name "jim" hr = cpRecoGrammar->AddWordTransition(hNames, NULL, L"jim", NULL, SPWT_LEXICAL, 1.0f, NULL); // Check hr // add name "diane" hr = cpRecoGrammar->AddWordTransition(hNames, NULL, L"diane", NULL, SPWT_LEXICAL, 1.0f, NULL); // Check hr // save grammar changes hr = cpRecoGrammar->Commit(NULL); // Check hr
<RULEREF>
Summary: The RULEREF tag is used for importing rules from the same grammar, or another grammar. The RULEREF tag is especially useful for reusing component or off-the-shelf rules and grammars.
XML Attributes: NAME (required): Specifies the string identifier of the rule to reference. The NAME or the REFID must be specified. If both are specified, they must refer to the same rule. OBJECT (optional): Specifies the programmatic identifier (ProgId) of the COM object which contains the compiled grammar. PROPID (optional, type=VT_I4): Specifies the numeric identifier of the semantic property attached to the rule reference. PROPNAME (optional): Specifies the string identifier of the semantic property attached to the rule reference. REFID (required, type=VT_I4): Specifies the numeric identifier of the rule to reference. The NAME or the REFID must be specified. If both are specified, they must refer to the same rule. URL (optional): Specifies the uniform resource locator (URL) of the rule to reference. The URL can be prefixed by "http://", "file://", or no prefix for a relative address. The URL can reference either a compiled grammar (e.g. *.cfg) or an uncompiled XML grammar (e.g. *.xml) which will be compiled by SAPI on demand. VAL (optional): Specifies the numeric value that will be associated with the semantic property attached to the rule reference. VALSTR (optional): Specifies the string value that will be associated with the semantic property attached to the rule reference. WEIGHT (optional, type=VT_UI4,VT_I4,VT_R4,VT_R8, default=1/n_sibling_transitions): The probability of the contents of the rule (which is referenced) being spoken by the user.
XML Parent Elements: LIST, L: List of phrases or rules which can be recognized. PHRASE, P: Phrase that must be recognized for the containing rule to be recognized. OPT, O: Optional phrase causing the rule reference to be implicitly optional. RULE: Rule that contains phrases or text to be recognized.
XML Child Elements: None
Detailed Description: The RULEREF tag is provided to grammar authors to allow for grammar reusability, and for structuring semantic properties into a hierarchy. Grammar reusability is provided by allowing rules to reference other rules. For example, an independent software vendor (ISV) could developer a series of grammars that supported mathematic operations and easy to speak numbers. They could redistribute their grammars via either a web site (URL, http), a COM object (ProgId), or a compiled grammar. Grammar authors who want to use the ISV's grammars would only need to add a RULEREF tag into their grammar which referenced the appropriate file or resource location. Similarly, grammar authors can build basic rule components into their grammars (e.g. spelling, numbers, or proper names), then build complex commands by reusing the basic rule components (local rule reference). Structured, hierarchal semantic properties are built on top of RULEs and RULEREFs. All of the semantic properties specified inside of a rule are siblings (ordered by order of declaration in the recognized transition path). The semantic properties that are in rules referenced by another rule are child properties of the rule that made the reference. For example, examine the following grammar: <RULE NAME="A" TOPLEVEL="ACTIVE"> <P PROPNAME="ROOT"> <RULEREF NAME="B" PROPNAME="ROOT_SIBLING"/> </P> </RULE> <RULE NAME="B"> <P PROPNAME="CHILD">hello</P> <P PROPNAME="LEAF">world</P> </RULE> The grammar contains two rules, one top-level rule which references another rule. The top-level rule contains two semantic properties, one attached to a phrase tag (e.g. "ROOT"), and the other attached to the rule reference tag (e.g. "ROOT_SIBLING"). The second rule also contains two semantic properties, one attached to a phrase tag (e.g. "CHILD), and the other attached to the phrase tag (e.g. "LEAF"). If the recognized phrase is "hello world", the semantic property structure is as follows: SPPHRASE->pProperties.pszName == "ROOT" SPPHRASE->pProperties->pNextSibling.pszName == "ROOT_SIBLING" SPPHRASE->pProperties->pFirstChild.pszName == "CHILD" SPPHRASE->pProperties->pFirstChild->pNextSibling.pszName == "LEAF" Note that no matter how many phrases or semantic properties are contained in a single RULE, all of the properties are siblings. Child semantic properties are only created by using rule references. See also the Whitepaper, Designing Grammar Rules: Retrieving Semantic Properties.
XML Grammar Sample(s): <GRAMMAR> <DEFINE> <ID NAME="RID_Numbers" VAL="1"/> <ID NAME="RID_AddNumbers" VAL="2"/> <ID NAME="PID_Value" VAL="1"/> </DEFINE> <!-- create a simple rule that reuses the local numbers rule component --> <RULE ID="RID_AddNumbers" TOPLEVEL="ACTIVE"> <P>add</P> <!-- the first operand will be a number from the numbers rule--> <!-- the application can retrieve the child property of this property "operand_1" which has a value of 1-5 --> <RULEREF REFID="RID_Numbers" PROPNAME="operand_1"/> <P>to</P> <!-- the second operand will be a number from the numbers rule--> <!-- the application can retrieve the child property of this property "operand_2" which has a value of 1-5 --> <RULEREF REFID="RID_Numbers" PROPNAME="operand_2"/> </RULE> <!-- Note that rule is not top-level and is only used as a reusable component rule --> <RULE ID="RID_Numbers"> <LIST PROPID="PID_Value"> <P VAL="1">one</P> <P VAL="2">two</P> <P VAL="3">three</P> <P VAL="4">four</P> <P VAL="5">five</P> </LIST> </RULE> <RULE NAME="SearchWeb" TOPLEVEL="ACTIVE"> <P>search web for site named</P> <!-- Reference a fictitious rule located on the web which contains a daily updated list of SR-friendly web site names --> <RULEREF NAME="SiteNames" URL="http://www.msn.com/WebServices/SpeechObjects.cfg"/> </RULE> <RULE NAME="SearchAddressBook" TOPLEVEL="ACTIVE"> <P>find address of</P> <!-- Reference a fictitious rule located in a registered COM object, which contains a dynamic list of Exchange server address book names --> <RULEREF NAME="FullNames" OBJECT="Exchange.SpeechGrammars"/> </RULE> </GRAMMAR>
Programmatic Equivalent: Application developers can programmatically import rules from URLs by using the following format: Rule Name = "URL:" + FILENAME + "\\" RULENAME For example, to import a rule called "Numbers" from the file "A.cfg", use the following sample code: SPSTATEHANDLE hSpeakNumber; SPSTATEHANDLE hsBeforeImport; SPSTATEHANDLE hsRuleImport; // Create new rule called "SpeakNumber" hr = cpRecoGrammar->GetRule(L"SpeakNumber", NULL, NULL, TRUE, &hSpeakNumber); // Check hr // Create new state for the beginning text hr = cpRecoGrammar->CreateNewState(hSpeakNumber, &hsBeforeImport); // Check hr // add the beginning text "speak the number" hr = cpRecoGrammar->AddWordTransition(hSpeakNumber, hsBeforeImport, L"speak the number", L" ", SPWT_LEXICAL, 1.0f, NULL); // Check hr // Import the rule "Numbers" from A.cfg hr = cpRecoGrammar->GetRule(L"URL:file://A.cfg\\Numbers", 0, SPRAF_Import, TRUE, &hsRuleImport); // Check hr // reference the "Numbers" rule after the beginning text hr = cpRecoGrammar->AddRuleTransition(hsBeforeImport, NULL, hsRuleImport, 1, NULL); // Check hr hr = cpRecoGrammar->Commit(NULL); // Check hr
<TEXTBUFFER>
Summary: The TEXTBUFFER tag is used for applications needing to integrate a dynamic text box or text selection with a voice command.
XML Attributes: PROPID (optional, type=VT_I4): Specifies the semantic property's numeric identifier. PROPNAME (optional): Specifies the semantic property's string identifier. WEIGHT (optional, type=VT_UI4,VT_I4,VT_R4,VT_R8, default=1/n_sibling_transitions): Specifies the probability of the TEXTBUFFER-based phrase being spoken by the user.
XML Parent Elements: LIST, L: List of phrases which can be recognized. PHRASE, P: Phrase that must be recognized for the containing rule to be recognized. OPT, O: Optional phrase that may be recognized. RULE: Rule that contains phrases or text to be recognized.
XML Child Elements: None
Detailed Description: The TEXTBUFFER tag is useful for applications that have a dynamic buffer of text, and want to allow the user to speak portions of the text. The most obvious example is likely the text selection user interface. The application offers a buffer of text, and allows the user to select any contiguous subset of the buffer. For example, when the text is "a b c d e", the user can select "a b c" and "c d e", but not "b e" since it is not a contiguous subset of the text buffer. The TEXTBUFFER tag allows the grammar author to define a command, and reference the dynamic text buffer which will be set and maintained at application run time. For example, the grammar might contain the command "select TEXTBUFFER_PORTION", which, when using the previous text sample, would allow the phrases "select a b c", "select "c d e", but not "select b e". The grammar author should focus her efforts on building commands to operate on the text buffer, while the application developer need only focus on maintaining the text buffer (see ISpRecoGrammar::SetWordSequenceData and ISpRecoGrammar::SetTextSelection) and responding to the TEXTBUFFER-based commands. The TEXTBUFFER has three main components, the complete text buffer, the text allowed text subsets in the buffer, and the active selection. The complete text buffer is a string of text characters, which is double-NULL terminated. The reason for using a double-NULL to allow for multiple exclusive subsets of the buffer to be active (e.g. each subset is a paragraph). The recognition engine will not recognize phrases which span the exclusive subsets (delimited by a single NULL character). The third component is the active selection, or current portion of the buffer that should be recognizable (e.g. the application can update the selection to include on the text visible on the screen, or only the text selected by the user). Note that any portion of the buffer that is not included in the TEXTBUFFER's active selection is not recognizable. The TEXTBUFFER tag is shared across all of the commands associated with a single grammar object. For applications that need to support multiple text buffers, the application has three options. If the text buffers use the same commands, but do not need to be active simultaneously, the application can use the active selection feature (of the TEXTBUFFER) to switch between buffers. If the text buffers are unique, but the buffers need to be active simultaneously, the application can use the single-NULL terminated subsets of the TEXTBUFFER (noting that each set is exclusive and non-contiguous). Finally, if the application has multiple text buffers, requires the buffers to be active simultaneously, and uses different commands for each buffer, the application can use a single grammar object for each buffer. The application should use semantic properties (see attributes PROPNAME and PROPID) to quickly and easily parse the TEXTBUFFER-related text out of the command. SAPI will automatically set the semantic property's phrase element range to match the elements taken from the TEXTBUFFER. The speech recognition engine must support text-buffers inside of a CFG for the grammar to load and activate successfully. The application can determine if an engine supports the TEXTBUFFER tag by retrieving the SR engine's object token (see ISpRecognizer::GetRecognizer), and then checking for the existence of the engine attribute "WordSequences" (see ISpObjectToken::MatchesAttributes).
XML Grammar Sample(s): <GRAMMAR> <!-- basic command to perform text selection --> <RULE ID="SelectText" TOPLEVEL="ACTIVE"> <P>select the words</P> <TEXTBUFFER PROPID="PID_SelectedText"/> </RULE> </GRAMMAR>
Programmatic Equivalent: To programmatically create a text-buffer transition in a CFG, the application developer can use the ISpGrammarBuilder::AddRuleTransition with a special rule handle, called SPRULETRANS_TEXTBUFFER. For example, the following code creates a simple command called "SelectText" which recognizes the command "select TEXTBUFFER". SPSTATEHANDLE hsSelectText; // Create new top-level rule called "SelectText" hr = cpRecoGrammar->GetRule(L"SelectText", NULL, SPRAF_TopLevel | SPRAF_Active, TRUE, &hsSelectText); // Check hr // Create an interim state before the text-buffer transition SPSTATEHANDLE hsBeforeTextBuffer; hr = cpRecoGrammar->CreateNewState(hsPlayCard, &hsBeforeTextBuffer); // Check hr // Add the command word "select" hr = cpRecoGrammar->AddWordTransition(hsSelectText, hsBeforeTextBuffer, L"select", L" ", SPWT_LEXICAL, 1.0f, NULL); // Check hr // Add text-buffer transition hr = cpRecoGrammar->AddRuleTransition(hsBeforeTextBuffer, NULL, SPRULETRANS_TEXTBUFFER, 1.0f, NULL); // Check hr // save/commit changes hr = cpRecoGrammar->Commit(NULL); // Check hr // ... perform other processing/setup // Setup text-buffer // Place the contents of text buffer into pwszCoMem and // the length of the text in cch SPTEXTSELECTIONINFO tsi; tsi.ulStartActiveOffset = 0; tsi.cchActiveChars = cch; tsi.ulStartSelection = 0; tsi.cchSelection = cch; pwszCoMem2 = (WCHAR *)CoTaskMemAlloc(sizeof(WCHAR) * (cch + 2)); if (pwszCoMem2) { // SetWordSequenceData requires double NULL terminator. memcpy(pwszCoMem2, pwszCoMem, sizeof(WCHAR) * cch); pwszCoMem2[cch] = L'\0'; pwszCoMem2[cch+1] = L'\0'; // set the text buffer data hr = cpRecoGrammar->SetWordSequenceData(pwszCoMem2, cch + 2, NULL); // Check hr // set the text selection information independently hr = cpRecoGrammar->SetTextSelection(&tsi); // Check hr CoTaskMemFree(pwszCoMem2); } CoTaskMemFree(pwszCoMem); // the SR engine is now capable of recognizing the contents of the text buffer
<WILDCARD>
Summary: The WILDCARD tag is used in rules or phrases that need added robustness and flexibility for the speaker's phrasing.
XML Attributes: None
XML Parent Elements: LIST, L: List of phrases which can be recognized. PHRASE, P: Phrase that must be recognized for the containing rule to be recognized. OPT, O: Optional phrase that may be recognized. RULE: Rule that contains phrases or text to be recognized.
XML Element Children: None.
Detailed Description: The WILDCARD tag is designed for applications that would like to recognize some phrases without failing due to irrelevant, or ignorable words. For example, an application may have a command with the phrase "save document". Many users may trivially modify the phrase by saying "save my document", "save the document", "save this document", etc.. With a pure CFG, the latter phrases would all fail to be recognized due to the extra words. The grammar author can add a wildcard, or garbage field, which will consume the extra words, and allow the application to successfully handle all of the phrases. In the aforementioned case, the grammar would need a wildcard before the word "document". The WILDCARD is different from DICTATION in that the application will never see the recognized garbage words, even though they were recognized. Consequently, the application and grammar author should not place wildcards in places which may affect the intended user action (e.g. "cancel save" is not the same as "please save". The grammar author can also use a special character, ellipsis (...) instead of the entire XML tag. See XML Grammar Format: Special Wildcard Tag. The speech recognition engine must support wildcards inside of a CFG for the grammar to load and activate successfully. The application can determine if an engine supports the WILDCARD tag by retrieving the SR engine's object token (see ISpRecognizer::GetRecognizer), and then checking for the existence of the engine attribute "WildcardInCFG" (see ISpObjectToken::MatchesAttributes). The engine can specify support for the WILDCARD tag to be anywhere in the CFG phrase (attribute value="Anywhere"), or only at the end (attribute value="Trailing").
XML Grammar Sample(s): <GRAMMAR> <!-- basic command to play the queen of hearts --> <RULE ID="PlayCard" TOPLEVEL="ACTIVE"> <P>play <WILDCARD/> queen of hearts</P> </RULE> <!-- basic command to play the queen of hearts, using special ellipsis --> <RULE ID="PlayCard_Ellipsis" TOPLEVEL="ACTIVE"> <P>play ... queen of hearts</P> </RULE> </GRAMMAR>
Programmatic Equivalent: To programmatically create a wildcard transition in a CFG, the application developer can use the ISpGrammarBuilder::AddRuleTransition with a special rule handle, called SPRULETRANS_WILDCARD. For example, the following code creates a simple command called "PlayCard" which recognizes the command "play WILDCARD queen of hearts".
SPSTATEHANDLE hsPlayCard; // Create new top-level rule called "PlayCard" hr = cpRecoGrammar->GetRule(L"PlayCard", NULL, SPRAF_TopLevel | SPRAF_Active, TRUE, &hsPlayCard); // Check hr // Create an interim state before the wildcard transition SPSTATEHANDLE hsBeforeWildcard; hr = cpRecoGrammar->CreateNewState(hsPlayCard, &hsBeforeWildcard); // Check hr // Add the command word "play" hr = cpRecoGrammar->AddWordTransition(hsSendMail, hsBeforeWildcard, L"play", L" ", SPWT_LEXICAL, 1.0f, NULL); // Check hr // Create an interim state after the wildcard transition SPSTATEHANDLE hsAfterWildcard; hr = cpRecoGrammar->CreateNewState(hsPlayCard, &hsAfterWildcard); // Check hr // Add interim wildcard transition hr = cpRecoGrammar->AddRuleTransition(hsBeforeWildcard, hsAfterWildcard, SPRULETRANS_WILDCARD, NULL, NULL); // Check hr // Add the command words "queen of hearts" hr = cpRecoGrammar->AddWordTransition(hsAfterWildcard, NULL, L"queen of hearts", L" ", SPWT_LEXICAL, 1.0f, NULL); // Check hr // save/commit changes hr = cpRecoGrammar->Commit(NULL); // Check hr
The previous sample code will support any of the following phrases: "play the queen of hearts" "play a queen of hearts" "play the left queen of hearts" etc. Note that the italicized words will be recognized by the speech recognition engine, but will not be returned to the application. The application should not put any application-logic sensitive inside of a wildcard, since the text is not returned.