Automation Interfaces and Objects

Microsoft Speech SDK

Intelligent Interface Technologies Home Page Microsoft Speech SDK Speech Automation 5.1

Automation Interfaces and Objects

The Automation Interfaces present provide object-oriented access to the speech recognition and text-to-speech capabilities of SAPI.

Please note that all automation interface names begin with "ISpeech" and that all automation object names begin with "Sp." Applications can explicitly create object variables which instantiate automation objects, using the "CreateObject" statement or the "New" keyword in a "Dim" or "Set" statement. Object variables which instantiate automation interfaces, on the other hand, are only created by the methods, properties and events of automation objects.

Additionally, some automation interfaces are implemented by automation objects, and the properties and methods of those interfaces are inherited by the objects. For example, the ISpeechBaseStream interface defines a set of properties and methods for storing and manipulating audio data in memory. The SpFileStream, SpMemoryStream and SpCustomStream objects implement the ISpeechBaseStream interface; as a result, the methods and properties of the ISpeechBaseStream interface are available in all three objects.

Automation Interface and Objects

SAPI 5.1 Automation consists of the following interfaces and objects:


Interfaces Description
ISpeechAudio Supports the control of real-time audio streams, such as those connected to a live microphone or telephone line.
ISpeechAudioBufferInfo Defines the audio stream buffer information.
ISpeechAudioStatus Provides control over the operation of real-time audio streams.
ISpeechBaseStream Defines properties and methods common to all audio stream objects.
ISpeechDataKey Provides access to the speech configuration database.
ISpeechGrammarRule Defines the properties and methods of a speech grammar rule.
ISpeechGrammarRules Represents a collection of ISpeechGrammarRule objects.
ISpeechGrammarRuleState Presents the properties and methods of a speech grammar rule state.
ISpeechGrammarRuleStateTransition Returns data about a transition from one rule state to another, or from a rule state to the end of a rule.
ISpeechGrammarRuleStateTransitions Represents a collection of ISpeechGrammarRuleStateTransition objects.
ISpeechLexiconPronunciation Provides access to the pronunciations of a speech lexicon word.
ISpeechLexiconPronunciations Represents a collection of ISpeechLexiconPronunciation objects.
ISpeechLexiconWord Provides access to a speech lexicon word.
ISpeechLexiconWords Represents a collection of ISpeechLexiconWord objects.
ISpeechObjectTokens Represents a collection of SpObjectToken objects.
ISpeechPhraseAlternate Enables applications to retrieve alternate phrase information from an SR engine, and to update the SR engine's language model to reflect committed alternate changes.
ISpeechPhraseAlternates Represents a collection of ISpeechPhraseAlternate objects.
ISpeechPhraseElement Provides access to information about a word or phrase.
ISpeechPhraseElements Represents a collection of ISpeechPhraseElement objects.
ISpeechPhraseInfo Contains properties detailing phrase elements.
ISpeechPhraseProperties Represents a collection of ISpeechPhraseProperty objects.
ISpeechPhraseProperty Stores the information for a semantic property.
ISpeechPhraseReplacement Specifies a replacement, or text normalization, of one or more spoken words.
ISpeechPhraseReplacements Represents a collection of ISpeechPhraseElement objects.
ISpeechPhraseRule Contains information about a speech phrase rule.
ISpeechPhraseRules Represents a collection of ISpeechPhraseRule objects.
ISpeechRecognizerStatus Returns the status of the speech recognition engine represented by the recognizer object.
ISpeechRecoGrammar Enables applications to manage the words and phrases for the SR engine.
ISpeechRecoResult Returns information about the recognition engine's hypotheses, recognitions, and false recognitions.
ISpeechRecoResultTimes Contains the time information for speech recognition results.
ISpeechVoiceStatus Contains status information about an SpVoice object.
Objects Description
SpAudioFormat Defines an audio format.
SpCustomStream Supports supports the use of existing IStream objects in SAPI.
SpFileStream Provides the ability to open files as audio streams and save audio streams as files.
SpInProcRecoContext Defines a recognition context, or a collection of settings, that requests a specific type of recognition as determined by the needs of an application.
SpInProcRecoContext (Events) Defines the types of events that a recognition context can receive.
SpInProcRecognizer Represents a speech recognition engine.
SpLexicon Provides access to lexicons, which contain information about words that can be recognized or spoken.
SpMemoryStream Supports audio stream operations in memory.
SpMMAudioIn Represents the audio implementation for the standard Windows wave-in multimedia layer.
SpMMAudioOut Represents the audio implementation for the standard Windows wave-out multimedia layer.
SpObjectToken Supports object token entries.
SpObjectTokenCategory Represents a class of object tokens.
SpPhoneConverter Supports conversion from the SAPI character phoneset to the Id phoneset.
SpPhraseInfoBuilder Provides the ability to rebuild phrase information from audio data saved to memory.
SpSharedRecoContext Defines a recognition context, or a collection of settings, that requests a specific type of recognition as determined by the needs of an application.
SpSharedRecoContext (Events) Defines the types of events that a recognition context can receive.
SpSharedRecognizer Represents a speech recognition engine.
SpTextSelectionInformation Provides access to the text selection information pertaining to a word sequence buffer.
SpUnCompressedLexicon Provides access to lexicons, which contain information about words that can be recognized or spoken.
SpVoice Enables an application to perform text synthesis operations.
SpVoice (Events) defines the types of events that can be received by an SpVoice object.
SpWaveFormatEx Defines the format of waveform-audio data.