About Speech Recognition Grammars (Microsoft.Speech)

Microsoft Speech Platform SDK 11

Collapse image Expand Image Copy image CopyHover image

A speech recognition engine matches spoken input to words and phrases that are defined by the rules in a speech recognition grammar. A simple grammar can be designed to recognize a small set of phrases. A more complex grammar can be designed to recognize and organize semantic content from a variety of user utterances. A speech recognition grammar also defines a set of properties that are specific to the grammar, such as locale, semantic format, and mode.

Rules

A speech recognition grammar contains one or more rules. Each rule defines a set of language constraints that a speech recognition engine uses to restrict the possible word or sentence choices during the speech recognition process. Speech recognition engines use grammar rules to control the elements of sentence construction using a predetermined list of recognized word or phrase choices. For more information, see Grammar Rules (Microsoft.Speech) and rule Element (Microsoft.Speech).

A grammar can define which rules are active when the grammar loads. For XML-format grammars that conform to the Speech Recognition Grammar Specification (SRGS) Version 1.0 and grammars created using members of the Microsoft.Speech.Recognition.SrgsGrammar namespace, the grammar designates one rule as the root rule of the grammar, which is active when the grammar loads. Any rules that the root rule references are also active when the grammar loads.

States and Transitions

Each rule in a grammar can be represented as a directed graph data structure that describes all of the possible phrases defined by that rule. The graph can be presented as a set of states, each with a set of possible transitions to other states. Each rule is defined by a single start state, a single end state, and a set of zero or more intermediate states.

The following illustration shows a graph representation of a rule that recognizes sentences such as "Today is Monday" and "Tomorrow will be Tuesday" or any other valid paths through the rule. A speaker must first say any of three different words "Yesterday", "Today", or "Tomorrow", causing a transition to, respectively, State 1, State 2, or State 3. From any of these intermediate states, a specific word must be spoken to cause a transition to the next intermediate state, State 4. From State 4, the user can say the name of any day of the week (Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, or Sunday) to cause a transition to the end state of the rule.

Rule Reference Transitions

All of the transitions between states in the preceding illustration occur as a result of an application user speaking the word shown in the transition arc. In addition to representing words, transitions can also represent references to other rules within the grammar. For example, instead of listing the names of the days of the week explicitly, the rule represented in the preceding illustration could have had a reference to a DayName rule as the transition from State 4 to the end state. For more information about rule references, see ruleref Element (Microsoft.Speech), the SrgsRuleRef class, the AppendRuleReference()()()() methods, and Grammar Rule Reference Referencing (Microsoft.Speech).

Properties of Grammars

You can set the properties of grammars to optimize them for specific recognition environments and tasks. For example, grammar properties specify the language that the grammar contains, whether the grammar is used for recognizing speech or dial tones, which of the grammar's rules to use, and the format for the grammar's semantic content.

How you specify a grammar's properties depends on the authoring format.

  • You specify the properties for grammars created using the GrammarBuilder class by setting properties on the class.

  • You specify the properties for grammars created using members of the Microsoft.Speech.Recognition.SRGSGrammar namespace by setting values for SrgsDocument properties.

  • You specify the properties for SRGS-compliant, XML-format grammars by entering values for attributes and elements within the grammar Element (Microsoft.Speech).

Whether the authoring format is GrammarBuilder, Microsoft.Speech.Recognition.SRGSGrammar, or XML, a completed grammar must be built into a Grammar object, using one of the constructors on the class, before it can be loaded by a speech recognition engine. A Grammar object has additional properties that describe whether the grammar is loaded by a speech recognition engine, whether it is enabled for recognition, its priority relative to other grammars, its weight (degree of influence) in ranking of recognition alternatives, and other properties.

See Also