CoffeeS5

Microsoft Speech SDK

previous page next page

Microsoft Speech SDK SAPI 5.1

CoffeeS5

Introduction

CoffeeS5 is the sixth sample application in a tutorial series named Coffee. It uses a consistent coffee shop motif. Customers enter the shop, go to the service counter, speak to order drinks or to enter the front office.

The samples are intended to demonstrate speech recognition capabilities within an application. They are designed for the application-level (API) programmer and for those not familiar with speech technology. Each sample will progressively add new features and increase in complexity. The tutorial chapters explain in detail particulars of the code. You are encouraged to read each chapter. Writing engines such as speech recognition or text-to-speech, also called device driver programming, will be covered separately. The samples can use engines provided by the SAPI SDK or third party SAPI-compliant engines.

Using CoffeeS5

CoffeeS5 expands the concepts of resources and resource management introduced in CoffeeS4. Using information that was learned by polling tokens about available voices, CoffeeS5 allows users to change the active voice. In doing so, a dynamic grammar is used. In the previous Coffee samples, all the speech commands were determined ahead of time and could not be changed. For example, the drinks were limited to five basic types and a new one could not be added. A dynamic grammar allows adding or removing commands during the program execution.

To change the voices, enter the office by saying, “go to the office” or “enter office.” Once there, display the voice list by saying, “manage the employees.” A list of available voices will display on the right side of the screen. The active voice will be indicated in red. To hear the employee speak, say, “hear them speak.” The statement “I will be the best employee you've ever had. Let me work.” will be spoken in the current voice.

To change the voice, say the voice name as it appears on the screen. For example, if "Microsoft Mary" is displayed, say, "Microsoft Mary." The highlighting will change to the selected voice. Having the employee speak will do so in the voice. Additionally, the list of available voices may be filtered by gender. The left side of the screen displays available commands for this. For example, "Show males only," will display only the male voices.

Some voices may not be applicable to this example. For instance, Sample TTS Voice is a composite voice for use with the SDK application MkVoice. The voice contains only seven words with an eighth word being the default for all other words. As a result, it will say "blah" most of the time. In the same way, the MS Simplifying Chinese Voice will spell the content rather than speak it.

New Commands List

Choose one word from each line of a category forms the command. Commands in parenthesis are optional and do need to be included. Words or phrases separated by slashes indicate any of the listed choices may be used although only one may be selected. Sections marked RULEREF indicate words or phrases may be chosen from the corresponding rule ID. Rule names are the same as listed in the corresponding XML configuration file.

XML rule ID: VID_OtherRules

(show) males only / (show) females only / (show) both genders

Rule ID: DYN_TTSVOICERULE

This is dynamic rule generated during run time. No XML code is present. After generation, it displays the names for all the available voices. The contents of the rule is displayed on the right side of the screen in the CoffeeS5 office after issuing the "manage the employees" command. The rule is generated at the time that command is issued and is destroyed after leaving the office afterward.

previous page start next page