Microsoft DirectX 9.0 SDK Update (Summer 2003) |
Voice Codecs
The compression/decompression (codec) algorithms provided with Microsoft® DirectPlay® are optimized for low-bandwidth voice compression and decompression. These codecs all operate on data based on an 8 kHz, 16-bit, mono format. However, DirectPlay Voice handles all the details of converting voice data to and from this intermediate format. Non-Microsoft codecs are not supported, and you cannot write proprietary codecs for use with DirectPlay Voice.
It is important to note that as the bandwidth requirements drop, the audio quality of the voice data also drops. The following table lists the supported codecs, the bandwidth in kilobits per second (Kbps), and the compression globally unique identifier (GUID) used to select them. The compression GUIDs are defined in Dvoice.h.
Codec | Bandwidth | GUID |
---|---|---|
Voxware VR12 | variable (1.2 Kbps, avg.) | DPVCTGUID_VR12 |
Voxware SC03 | 3.2 Kbps | DPVCTGUID_SC03 |
Voxware SC06 | 6.4 Kbps | DPVCTGUID_SC06 |
TrueSpeech | 8 Kbps | DPVCTGUID_TRUESPEECH |
Global System for Mobile Communications (GSM) | 13 Kbps | DPVCTGUID_GSM |
Microsoft Adaptive Delta Pulse Code Modulation (MS-ADPCM) | 32 Kbps | DPVCTGUID_ADPCM |
Pulse Code Modulation (PCM) | 64 Kbps | DPVCTGUID_NONE |
The first three codecs provide a high level of compression and have approximately the same resource demands. On a 500 MHz Pentium III class computer, these codecs use approximately 1.5 percent of the CPU capacity. The VR12 codec sounds tinny and robotic, but the SC03 and SC06 codecs provide reasonable fidelity. The PCM codec provides the highest sound quality and is essentially uncompressed, 8 kHz, 16-bit, mono-format audio data.
Selecting a Codec
As with all other game setup parameters, the host controls which codec is used for the voice session. All members of the voice session must use the same codec. Remember that in a peer-to-peer voice session, the voice-session host does not necessarily have to be the same as the game-data host. The host selects the codec when it calls IDirectPlayVoiceServer::StartSession. Set the guidCT member of the DVSESSIONDESC structure to the compression GUID of the codec that you want to use. A client can retrieve this structure by calling IDirectPlayVoiceClient::GetSessionDesc.
The same codec might not be ideal for the entire duration of a game. For instance, you might want to use one codec for the lobby chat feature that players use to set up the game, and another to handle voice communication after the game is launched. You cannot dynamically change codecs during a voice session. To switch to another codec, you must terminate the current voice session and create a new voice session with the new codec. However, you can stop and restart a voice session without terminating the underlying DirectPlay core session.
As with any form of network communication, it is important to analyze the cost of the voice communication to ensure that adequate bandwidth is available to support communication of the game data and voice data. Analyzing the voice bandwidth consumption is straightforward: Estimate the number of simultaneous voice streams that you anticipate and multiply that number by the sum of the bandwidth required by the codec and the protocol overhead. CPU consumption is another factor to consider when choosing a codec. As with network bandwidth, CPU resource consumption is additive, per stream.