Dictation Pad

Microsoft Speech SDK

The Microsoft.com Speech website Microsoft Speech SDK SAPI 5.1

Dictation Pad

 

Dictation Pad is an example of a speech-enabled word processor. This sample application is intended to demonstrate many of the features for SAPI 5 in a single coherent application. It is not a full featured speech-enabled application, although the foundations of many of options are present.

Using Dictation Pad you can speak into a microphone and, following successful speech recognition (SR), Dictation Pad will display the sentence on the screen as text. The words can also be spoken back in a text-to-speech (TTS) voice, highlighting words as they are spoken. Features include the following:

  • Dictation - Recognizes words in any context.
  • Command and control - Recognizes a limited selection of words and applies them to control the flow of Dictation Pad. This includes using speech to select items from menus and changing the SR mode from dictation to command and control.
  • Playback - Plays back words appearing on the screen in a TTS voice.
  • Speakback - Keeps an audio record of the actual spoken content. You can play it back to confirm or verify speech recognition.
  • Phrase tracking - Maintains a list of phrase element information. This can locate the parts of an SR phrase even if the dictation becomes broken or disjointed. It also demonstrates text replacement such as inverse text normalization. This is the process of converting text to numbers such as "one two three" into "1-2-3" or "first" into "1st."
  • Word Alternates - Displays a list of alternates for the recognized text. From this list you can select a replacement for the original text.
  • Adding words to a grammar - Demonstrates the SR engine's capability to add words or phrases to an existing grammar or word database. Adding a word allows it to be recognized on subsequent occurrences.
  • Document management - Saves documents and opens them retaining the data associated with the recognized SR results.

 

The complete code base for Dictation Pad is included with the SDK and you are encouraged to look at and examine the code. It is intended to be a training aid and to demonstrate as many features as possible.


Note About SR/TTS Engines

Dictation Pad supports common SAPI features such as various user interface calls. However, SR or TTS engines are not required to provide all the features. The current engine will be queried for features that Dictation Pad supports. If available, Dictation Pad will use a feature; otherwise, the feature will not be available or the menu item will be inactive.

For example, Dictation Pad uses the SAPI feature of the Add/Remove Words interface. The Microsoft ASR Version 5 engine supports this feature and it is available to Dictation Pad. The SAPI 5 Sample Engine from the SAPI SDK does not support it; hence, Dictation Pad deactivates the Add/Remove dialog menu item.

 

Dictation Pad Menu/Toolbar

 

The main window of the Dictation Pad contains both a toolbar and a menu bar you can use to control all the application's functions. The toolbar is a convenience feature and you may access many of its functions through the menu.

 

 

 

File Menu

 

File menu items control the documents that are used in the application.

New
Creates a new document. Multiple documents cannot be open at the same time so the existing document must be saved and closed, or discarded before creating the new one.
Open
Opens a previously saved document. Similar to creating a new document, any current one must be saved first.
Save
Saves the current file. Files are saved as a proprietary *.dpd format.
Save As
Saves the current file under a new or different name.
Exit
Quits Dictation Pad. Unsaved files may be saved before exiting.

 

 

Edit Menu

 

Edit menu items control the copy and pasting of text for the current document.

Cut
Copies the text to the clipboard and removes the selected text from the document.
Copy
Copies the text to the clipboard.
Paste
Copies the text from the clipboard to the document. The text is either placed starting with the cursor insertion point or, if text is selected, replaces the selection with the new text.

 

 

Voice Menu

 

Voice menu items control the speech enabling aspect of the application.

 

Listen for Dictation
Enables Dictation Pad to receive speech in dictation form rather than for command and control. This means you can speak any words or combination of words and they will be recognized. This option is mutually exclusive from Listen for Commands.
Listen for Commands
Enables Dictation Pad to receive speech in command and control form rather than as dictation. You are limited to words defined for application control only. This includes a word equivalent for most menu items or buttons. This option is mutually exclusive from Listen for Dictation.
Playback
Reads back the text. To have a portion of the text read back, select the text you want. If you do not select any text, reading begins at the cursor insertion point. If the insertion point is at the end of the document, the entire document will be read.

If the text was originally dictated, the playback will be the recorded audio of your voice. If you did not dictate the text, but rather typed or pasted it into the document, the TTS voice will read the text.

Grammar Activation
Turns grammars on or off. Words or phrases added through the Add/Delete Word(s) option are neither available nor recognized if the grammar is turned off. By default, this option is turned off.
Add/Delete Word(s)
Brings up the SR engine-specific user interface so you can add words to or delete words from the lexicon (or dictionary).
Select Whole Words
Sets the selection state for word highlighting. If selected, the entire word will be automatically highlighted during the selection process. Otherwise, only the words and letters actually selected will remain selected.
Shared recognition engine
Sets resource sharing. By default, Dictation Pad uses the "In process" (also referred to in SAPI 5 as InProc) resource model that causes the SR engine to exist in the same process as Dictation Pad and restricts other applications from using resources required by this application. Other resources include the microphone, so that all audio input is given to Dictation Pad rather than another application currently running. If selected, the SR engine may reside in a separate process.
Voice Training
This brings up Speech Training Recognition Wizard for training or additional training. This wizard is accessed through Speech properties in Control Panel. On the SR tab, click Train Profile to bring up the voice training wizard.
Microphone Setup
This brings up the Microphone Wizard for adjusting the microphone set up. This wizard is accessed through Speech properties in Control Panel. On the SR tab, click Configure Microphone to bring up the microphone wizard.

 

 

Using Dictation Pad

Speech Recognition

By default, Dictation Pad starts in dictation mode with the microphone off. To start speech recognition, click Microphone, select the Voice->Microphone menu item or use control-m. Begin speaking. To indicate processing, ellipses ("...") display in the window. During the recognition process as SAPI starts returning words or phrases, the text appears dimmed until a final recognition is made. When a final recognition is determined, the text will darken and the insertion point will advance.

 

Below the insertion point is a small box. Click this box to display a list of alternate words. SAPI 5 places the final result of its word search on the screen, but you can choose another word by selecting it from the alternate list. This new choice replaces the existing word.

 

Text-to-Speech

Text may be read back using the TTS voice. Select Voice->Playback or click Play to hear the text.

If no text is selected, Dictation Pad will begin reading from the insertion point to the end of the document. If you select specific text, only the highlighted portion will be read. In either case, the word currently being read will be highlighted. The words will continue to be highlighted until the selection or portion is read. Any text selected prior to being read will remain selected afterward. If the insertion point is at the end of the document, the entire document will be read.

To stop or interrupt playback, click Play again from either the toolbar, menu, or use control-p. You may also click anywhere in the edit window or press Esc to stop playback.

You may change voices characteristics from the TTS tab of Speech properties. For example, you may change voices or change the speaking rate of the voice. The newly selected voice will automatically be previewed so you can confirm the choice. You can change the speaking rate with the Speed slider bar.

 

Command and Control

You may use speech to control program flow rather than using it as dictation. In this way, the menus, menu items, and the cursor may be controlled by speech and are collectively referred to as command and control. This is fundamentally different from dictation both programmatically and functionally. One difference is that the word selection is severely limited. Words are restricted to essentially coincide with the menu items, buttons, or serve as logical cursor commands. Other words will not be recognized.

 

You can switch to command mode in one of three ways. The first is through the Voice->Listen for Commands menu item. The second is to click Command on the toolbar. The third way to switch is during dictation by saying "command." Regardless, Command on the toolbar will automatically depress as visual confirmation of the current mode.

 

Once in command mode, you can speak the commands. If it is recognized successfully, the action takes place. A menu command will drop down the appropriate menu. The action for a menu item will be directly applied. A cursor command will move the cursor. If the command was not understood, no action will take place. A command may not be recognized for several reasons. The word itself may not be on the command list, or the word may not have been spoken clearly, or background noise may have obscured it. Similarly, the menu item may not be applicable to the situation. If the reason is unclear, you should repeat the command clearly and perhaps more slowly. There might also be a slight lag time in response depending on the computer system's capability.