Lossy audio formats: quality, multichannel and looping

FMOD Studio API

previous page next page

Lossy audio formats: quality, multichannel and looping

Quality and bit rate

What is the relationship between bit rate and the 'compression quality' property?

Within FMOD Designer, the compression quality property is found in the wave bank property panel. In FSBankEx the quality property is in the format options. The relationship between bit rate and the compression quality property (when dealing with constant bit rate compression), is appropriately:

bit rate = quality * 3.2

This is the case for MP2/MP3 but may differ for XMA and other bitrate based formats.

Bit rates and sample rates for MPEG data

The following table shows the available bit rates and sample rates available for MPEG data within FMOD:

Note! This is the MPEG version, not the 'layer' version. Layer 2 and 3 are commonly known as MP2/MP3. MP3 for example could be MPEG 1 or 2, but is still 'layer 3'
Both MP2 support and MP3 support share the same MPEG versions and bitrate/samplerate capabilities.

MPEG 1 Bitrates (kbps)

MPEG 1 Sample rates (kHz)

MPEG 2 Bitrates (kbps)

MPEG 2 Sample rates (kHz)

32	32	8*	8*
48	44.1	16	~~11.025~~*
56	48	24	12*
64		32	16
80		40	22.05
96		48	24
112		56
128		64
160		80
192		96
224		112
256		128
320		144
384		160

* Note that the crossed out values are not supported by FSBankEx even though they are specified as part of the MPEG format specification.

Should the user attempt to use a sample rate not listed, FMOD will automatically resample the file (upwards) to the next valid sample rate. For example, a file with a sample rate of 15kHz will be resampled to 16hHz.

Multi-channel MPEG Encoding

FMOD is able to create MPEG files with up to 16 channels (eight stereo pairs). To do this, the build process:

Encodes each stereo pair into fixed sized MPEG frames. The size of the frames is determined by the bit rate. The size of the frame must be a multiple of 16 bytes. To insure this, a pad of 0 to 15 bytes is placed at the end of each frame.

Interleaves a frame from each stereo pair into a multi-channel frame.

This process is illustrated in the figure below.

Figure 1: Encoding a multi-channel MPEG file

For example, let's consider a six-channel MPEG file using a constant bit rate of 128 kbps. The six channels are encoded into three stereo pairs. Each frame of stereo MPEG data is 432 bytes (including a 14 byte buffer). FMOD interleaves the stereo frames every 432 bytes into a multi-channel MPEG frame. The size of the multi-channel MPEG frame can be calculated as frame size * Number of stereo pairs. In this example, the multi-channel MPEG frame is 432 * 3, giving 864 bytes.

Encoding mp3 files for seamless looping

Typically when an mp3 file is looped, an audible gap can be heard when playback loops back to the start. This gap is obvious when the loop requires a sample accurate stitching from the last sample to the first. This occurs for a number of reasons, the two major factors being:

MPEG 1 layer 3 encodes the audio data into frames of 1152 samples. If the audio data doesn't fill a frame (most importantly the last frame), the encoder will pad the frame with silent samples (some encoder will add an entire silent frame!)

The decoding of an mp3 frame is dependent on the previous frame. When a loop occurs, the decoder will require data from the last frame to smoothly loop back to the first frame.

Without special encoding, it is not possible for mp3 data to loop seamlessly - fortunately FMOD does provide a method to do just that! The FMOD mp3 encoder can be accessed via FMOD Designer or FSBankEx. For Designer users, the special encoder is automatically used if the sound definition instance is set to loop and the wave bank compression property to 'MP3'. Note: if the sound definition instance is set to 'one-shot' the standard mp3 encoding is used. Users of the lower level API can specify the FSBankEx to encode mp3 data for seamless looping.

So what does FMOD do to provide seamless loop of mp3 data?

Firstly, FMOD's encoder will resample and stretch the last frame to ensure that all 1152 samples of the frame are used. This will ensure the frame is not padded with silent samples.

When used on some sources, this process may cause a slightly audible pitch change artifact. If this is the case, user are encouraged to repeat the audio within the file to increase the file size, so the time stretch distance becomes less significant. Users may also resize the length of their audio to a multiple of the frame size. The table below lists the frame size for various formats.

Format

Frame size (samples)

MPEG 1	1152
MPEG 2 (2.5)	576
XMA	2048
VAG	28
GCADPCM	36

With the removal of any padding within the last frame, FMOD's encoder must then prime the first frame with data from the last frame. The last frame is then removed. This allows FMOD's decoder to avoid issues of frame dependency between the first and last frame and provide a seamless loop.

In most situations FMOD's encoder and decoder will perform perfect looping of mp3 content. However some audible artifacts can be introduced, this is illustrated below.

Figure 2: Encoding MPEG frames for seamless looping

When the first frame contains silence and the last frame contains an audible signal, the interpolation used in priming the first frame will result in an audible 'pop'. Should users require silence in the first frame of their loop, they should:

make sure the original wave loops properly, or

pad the end of the file with a frame of silence.

XMA Quality and Compression

As specified (in part) in the Xbox SDK documentation:

The XMA encoder allows the Sound Designer to specify a quality setting between 1 and 100, where:

1 provides the highest compression level and the lowest quality, and
100 provides the lowest compression level and the highest quality.

XMA's variable bit rate compression is content dependent, meaning compression ratios can vary greatly between pieces of content.

This means the quality settings do not translate directly to specific compression ratios.

The Xbox 360 Development Kit suggests a compression rate between 8:1 and 15:1 will provide adequate quality for most game audio assets.

previous page start next page