SPPHRASE

Microsoft Speech SDK

previous page next page

Microsoft Speech SDK SAPI 5.1

SPPHRASE

SPPHRASE
contains information about speech recognition information, including hypotheses, false recognitions, recognitions, and alternate recognitions. The information in the phrase includes, language, audio and event timing, text (display and lexicon), inverse text replacements, semantic tags (i.e., properties), and depending on the engine, an optional block of engine-specific phrase data.

SAPI typically provides the application with a pointer to a block of memory that has been allocated by CoTaskMemAlloc, which the application must free using CoTaskMemFree when it is finished with the phrase information.

typedef struct SPPHRASE
{
    ULONG                       cbSize;
    LANGID                      LangID;
    WORD                        wReserved;
    ULONGLONG                   ullGrammarID;
    ULONGLONG                   ftStartTime;
    ULONGLONG                   ullAudioStreamPosition;
    ULONG                       ulAudioSizeBytes;
    ULONG                       ulRetainedSizeBytes;
    ULONG                       ulAudioSizeTime;
    SPPHRASERULE                Rule;
    const SPPHRASEPROPERTY     *pProperties;
    const SPPHRASEELEMENT      *pElements;
    ULONG                       cReplacements;
    const SPPHRASEREPLACEMENT  *pReplacements;
    GUID                        SREngineID;        
    ULONG                       ulSREnginePrivateDataSize;
    const BYTE                 *pSREnginePrivateData;
} SPPHRASE;

Members

cbSize: The size of this structure in bytes.
LangID: The language ID of the phrase elements.
wReserved: Reserved for future use.
ullGrammarID: ID of the grammar that contains the top-level rule used to recognize this phrase.
ftStartTime: Absolute time for start of phrase audio as a 64-bit value based on the Win32 APIs, SystemTimeToFileTime and GetSystemTime. When an application uses wav file input, SAPI sets the stream position and start time information to zero.
ullAudioStreamPosition: The starting offset of the phrase in bytes relative to the start of the audio stream. If downsampling an audio stream, ullAudioStreamPosition will be the byte position within the original stream.
ulAudioSizeBytes: Size of audio data, in bytes, for this phrase.
ulRetainedSizeBytes: Size, in bytes, of the retained audio data (in the user-specified retained-audio format).
See also ISpRecoContext::SetAudioOptions for more information about specifying the retained audio format
ulAudioSizeTime: Length of phrase audio in 100-nanosecond units.
Rule: Information about the top-level rule (and rule-reference hierarchy) used to recognize this phrase.
pProperties: Pointer to the root of the semantic-tag property tree.
pElements: Pointer to the array of phrase elements (the number of elements is contained in Rule). Each phrase element includes position and text information, including lexical and display forms.
cReplacements: Number of text replacements. Text replacements are generally based on engine-defined Inverse Text Normalization rules (e.g. recognize "five dollars" as "$5").
pReplacements: Pointer to the array of text replacements.
SREngineID: GUID that identifies the particular speech recognition (SR) engine that recognized this phrase.
ulSREnginePrivateDataSize: Size of the engine's private data, in bytes.
pSREnginePrivateData: Pointer to the engine's private data.
Engine private data is specific to each SR engine, and the format and structure of the data is not defined by SAPI.

previous page start next page