Speech Recognition Properties - Microsoft Speech Platform (Server)

Microsoft Speech Platform SDK 11

Speech Recognition Properties

Introduction

This document describes the ISpProperties elements for speech recognition (SR) engines that comply with SAPI 5. This whitepaper will serve to define these attributes only for SR engines. Application developers hoping to build a speech recognition engine compliant with SAPI 5 should reference this document. For more information, refer to the help documents in the Microsoft Speech Platform (Server) SDK.

ISpProperties

ISpProperties is an interface that enables the SR and TTS engines to get or set various attributes for an object. The attributes are passed to the engine via the ISpProperties interface. ISpProperties are identified by a unique LONG value. SAPI defines certain attributes known as system attributes. The range of these attributes is from 0x0001 to 0xffff. Vendor ISpProperties attributes are defined by a unique high word value (two ANSI Characters that identify the engine vendor).

Attributes may be LONGs, strings, or memory addresses.

SR Properties

The following table lists the SR properties that are set by the application and passed to the SR engine via SAPI. These attributes are not required for SAPI compliance. However, the ranges accompanied by the attributes are required values and the exact interpretation of the values is left to the SR engine. The different implementation is defined by each property. The SAPI ranges and defaults for each property are also shown.

dwAttrib Value

WCHAR Value

Meaning

Range

SPPROP_RESOURCE_USAGE

ResourceUsage

The ResourceUsage specifies the engine CPU consumption. As the resource usage increases, so does the required CPU power.

0 - 100

default = 50

SPPROP_HIGH_CONFIDENCE_THRESHOLD

SPPROP_NORMAL_CONFIDENCE_THRESHOLD

SPPROP_LOW_CONFIDENCE_THRESHOLD

HighConfidenceThreshold

NormalConfidenceThreshold

LowConfidenceThreshold

The threshold values are used to divide a confidence scale into four portions: rejected, low, medium, and high. The location of the low confidence, normal confidence, and high confidence markers control how the confidence of a word is labeled. The HighConfidenceThreshold (HCT) separates the high and medium confidence range. The NormalConfidenceThreshold (NCT) separates the medium and the low confidence thresholds. The LowConfidenceThreshold (LCT) separates the low and rejected confidence range.

Note: SPPROP_LOW_CONFIDENCE_THRESHOLD is not used by the Microsoft Speech Platform (Server).

If the all three confidences are equal to 0, then all words will have high confidence. If all three confidences are equal to 100, then all words will have low confidence.

0 - 100

default

LCT = 20

NCT = 50

HCT = 80

SPPROP_REJECTION_CONFIDENCE_THRESHOLD

CFGConfidenceRejectionThreshold

The speech recognition engine accepts full utterances with confidence scores above or equal to this threshold, and rejects full utterances with phrase confidence scores below this threshold. This property accepts the following values:

  • The value -1 causes the engine to use its default value.
  • A value in the range of 0-100 sets the phrase confidence rejection threshold to the specified value. If this value is set to 0, the speech recognition accepts all utterances. If this value is set to 100, the speech recognition engine rejects all utterances.

This property is not to be confused with SPPROP_HIGH_CONFIDENCE_THRESHOLD, SPPROP_NORMAL_CONFIDENCE_THRESHOLD, or SPPROP_LOW_CONFIDENCE_THRESHOLD, which are used to determine how any given confidence value is categorized (low, medium, or high).

-1

0-100

SPPROP_RESPONSE_SPEED

ResponseSpeed

This indicates the amount of silence the engine looks for before completing a recognition. This attribute is used when the recognition is not ambiguous. For example, in the case of a context-free grammar (CFG) which has two sentences: 1) new game please and 2) new game, a non-ambiguous recognition would be "new game please."

0 - 10,000 ms

SPPROP_COMPLEX_RESPONSE_SPEED

ComplexResponseSpeed

This indicates the amount of silence that the engine will look for before completing a recognition. This attribute is used when the recognition is ambiguous. For example, in the case of a CFG which has two sentences: 1) new game please and 2) new game, an ambiguous recognition would be "new game." This property's value must be greater than the ResponseSpeed value.

ResponseSpeed - 10,000

SPPROP_ENGINE_THREAD_PRIORITY

EngineThreadPriority

Sets the priority of the engine thread(s). The range of permitted values is defined by the OS.

  • The minimum value is min(THREAD_PRIORITY_IDLE,THREAD_PRIORITY_HIGHEST)
  • The maximum value is max(THREAD_PRIORITY_IDLE,THREAD_PRIORITY_HIGHEST)

The values of these constants are in defined winbase.h.

Defined by the OS

SPPROP_ADAPTATION_ON

AdaptationOn

Indicates whether the recognition engine should adapt the acoustic model.

1 or 0

default = 1

SPPROP_PERSISTED_BACKGROUND_ADAPTATION

PersistedBackgroundAdaptation

Turns background adaptation on or off, and persists the setting in the registry.

0 or any nonzero value

default = 1 (adaptation is on)

SPPROP_PERSISTED_LANGUAGE_MODEL_ADAPTATION

PersistedLanguageModelAdaptation

Controls whether language model adaptation is performed, and persists the setting in the registry. This controls whether new words are learned and the ability to learn how the user combines words to form sentences.

0 or any nonzero value

default = 1 (adaptation is on)

SPPROP_ASSUME_CFG_TRUSTED_SOURCE

AssumeCFGFromTrustedSource

Bypasses file integrity checks when loading a CFG, to reduce load time. This should *only* be used by applications that can guarantee that the CFG they are loading has previously been compiled by the application (or by another application it trusts) and has been stored in a secure location where it could not be edited by a malicious agent.

0 (default - property is OFF)

1 (property is ON)