ISpSREngine::RecognizeStream

Microsoft Speech SDK

The Microsoft.com Speech website Microsoft Speech SDK SAPI 5.1

ISpSREngine::RecognizeStream

ISpSREngine::RecognizeStream begins recognition processing on a stream. From this point on, the engine can read data, perform recognition, and send results and events back to SAPI. When all the data has been recognized and read, or the application has deactivated recognition, the engine finishes processing and returns from this method.

HRESULT RecognizeStream(
   REFGUID               rguidFmtId,
   const WAVEFORMATEX   *pWaveFormatEx,
   HANDLE                hRequestSync,
   HANDLE                hDataAvailable,
   HANDLE                hExit,
   BOOL                  fNewAudioStream,
   BOOL                  fRealTimeAudio,
   ISpObjectToken       *pAudioObjectToken
);   

Parameters

rguidFmtId
[in] The REFGUID of the input audio format to recognize. This will be SPDFID_WaveFormatEx for wave format files.
pWaveFormatEx
[in] The WAVEFORMATEX structure describing the input format (if it is a wave format). Only a format that the engine has already indicated it can process (by returning the format from ISpSREngine::GetInputAudioFormat) will be used.
hRequestSync
[in] This is a Win32 event handle that is set whenever there are tasks (such as grammar changes etc.) waiting for the engine to respond to. The tasks get processed whenever the engine calls Synchronize. The engine can call Synchronize regularly or do so only when this event is set.
hDataAvailable
[in] This is a Win32 event handle that is set when data is available for reading. The amount of data to be available before this event is set can be controlled by calling ISpSREngineSite::SetBufferNotifySize. By default, this event will be set whenever any amount of data is available. This event can be used as an alternative to ISpSREngineSite::DataAvailable.
hExit
[in] This is a Win32 event handle indicating when the engine should exit. The engine on one of two conditions:
  • When there is no more data in the stream and it has finished processing, or
  • If this event is set. Recognition or Synchronize calls returning S_FALSE indicate that this event has been set.
fNewAudioStream
[in] Indicates whether the input is a new stream. TRUE indicates it is a newly created stream; FALSE otherwise. For example, if an application deactivates the rules, RecognizeStream returns, and later the application activates some rules, the RecognizeStream call will have this parameter set as FALSE because the stream had exited previously. Only if the application calls ISpRecognizer::SetInput to create a new stream, will this return TRUE. Some engines will find this information useful if resetting channel adaptation, for example, a new telephone call.
fRealTimeAudio
[in] Indicates whether the input is real time audio. TRUE means it is real time audio; FALSE otherwise. Real-time inputs in SAPI are those that implement the ISpAudio interface – for example the standard multi-media microphone input. Non-real time streams are those that only implement ISpStreamFormat - for example input from wave files using the ISpStream object. With non real-time streams all the data is available for reading immediately. The hDataAvailable event is always set and the DataAvailable method will always return INFINITE.
pAudioObjectToken
[in] The object token interface for the audio object that the stream was created from. Engines do not need to do anything with this parameter, but it may be useful in some circumstances.

Return values

Value Description
S_OK Function completed successfully. This should be returned if the engine is exiting because the stream has ended, or because it was signaled to exit by SAPI.
FAILED (hr) Appropriate error message if the engine is terminating for an unexpected reason.

Remarks

The engine can read audio data using ISpSREngineSite::Read. The engine determines how much data is available for reading with ISpSREngineSite::DataAvailable, or the hDataAvailable event handle. The engine does not have direct access to the input audio device and will perform in a consistent way regardless of whether input is from desktop audio, wave files, or a custom audio device. The audio format is given by the rguidFmtId and pWaveFormatEx parameters, and additional details of the audio device can be found from the fNewAudioStream, fRealTimeAudio, and pAudioObjectToken parameters. When a Read call indicates that there is no more data available, the engine should complete processing on the data it has and return from the RecognizeStream method.

The engine recognizes from all rules and/or dictation grammars that have been activated. If there are multiple active rules and/or dictations, the engine is expected to recognize from all things "in parallel." That is, the user is able to say something from any rule that is active. It is possible for this method to be called with nothing active. In this case, the engine can just read data and then discard it, or use it to gather environmental noise information.

Because the engine remains in the RecognizeStream method all the time that it is recognizing, SAPI has effectively given the engine one thread on which to perform recognition. It is possible to write an engine which processes everything on this one thread and thus does not require any additional threads, critical sections, or other thread-locking.

It is also possible to have alternative arrangements with additional threads. For example, one thread could read data, while another thread could do the actual recognition processing. SAPI makes no restrictions about which threads call which methods or whether they are called simultaneously.

The engine uses ISpSREngineSite::Synchronize to be notified of any grammar or other changes that are pending, and uses ISpSREngineSite::UpdateRecoPos to keep SAPI informed of how much of the stream has been recognized. The engine passes details of events and recognition results back to SAPI with ISpSREngineSite::AddEvent and ISpSREngineSite::Recognition.