Firelight Technologies FMOD Studio API
Performance Tutorial
Introduction
Measuring and tweaking performance is an important part of any application and being able to scale FMOD from low power portable devices to the very latest in next gen consoles is key to our design. This guide should give you a solid understanding of how to configure FMOD to fit within your audio budget with specific tips no matter which platform you are targeting.
Before we jump into the details lets first consider how performance is measured in FMOD. The primary metric we use when discussing how expensive something is, is CPU percentage. We can calculate this by measuring the time spent performing an action and comparing it against a known time window, the most common example of this is DSP or mixer performance.
What is the mixer and how is it measured? When we talk about mixer performance we are actually talking about the production of audio samples being sent to the output (usually your speakers). At regular intervals our mixer will produce a buffer of samples which represents a fixed amount of time for playback. We call this the DSP block size and it often defaults to 512 samples, when played back at 48KHz it represents ~10ms of audio.
With a fixed amount of samples being produced regularly, we can now measure how long it takes to produce those samples and receive a percentage. For example, if it took us 5ms of CPU time to produce 10ms of audio, our mixer performance would be 50%. As the CPU time approaches 10ms we risk not delivering the audio in time which results in a audio discontinuity known as stuttering.
What else can be measured?
Another key performance area is update, this operation is called regularly to do runtime housekeeping. Our recommendation is you call update once per render frame which is often 30 or 60 times per second. Using the 30 or 60 FPS (frames per second) known time frame we can now measure CPU time spent performing this action to get percentages.
Armed with the ability to measure performance we now need to identify the things that cost the bulk of the CPU time. The most commonly quoted contributor is voice count, following the logic that playing more sounds will take up more CPU time. Following is a list of the main contributors to the cost of sound playback:
- Decoding compressed audio to PCM.
- Resampling the PCM to the appropriate pitch.
- Applying DSP effects to the sound.
- Mixing the audio with other sounds to produce the final output you hear.
Choosing the correct compression format for the kind of audio you want to play and the platform you want to play it on is a big part of controlling the CPU cost. For recommendations on format choice please consult the performance reference for this platform.
Voice Limiting
Once you've settled on a compression format you need to decide how many sounds of that format you want to be audible at the same time. There are three ways you can use to control the number of sounds playable:
- System::init(maxChannels, ...) The maximum number of voices playing at once.
- System::setSoftwareChannels(numSoftwareChannels) The maximum number of audible voices.
- FMOD_ADVANCEDSETTINGS max???Codec The maximum number of decoders where ??? is the compression format.
For a deep dive into how the virtual voice system works and ways to further control voice count please consult the virtual voices tutorial.
It's often hard to gauge what are good values to use for the above three settings. In rough terms maxChannels should be high enough that you don't hit the cap under normal circumstances, so 256, 512 or even 1024 are reasonable choices. Selecting the values for numSoftwareChannels and maxCodecs will depend on the platform and format used. To help choose these values we have provided some recommendations and benchmarks in the performance reference document for this platform.
Tips and Tricks
With a correctly configured compression format and appropriate voice count you are well on your way to an efficiently configured set up. Next up is a series of tips to consider for your project, not all will be applicable but they should be considered to get the best performance from FMOD.
Sample Rate
There are two sample rates you need to think about when optimizing, the System rate and the source audio rate.
You can control the System sample rate by using System::setSoftwareFormat(sampleRate, ...), which by default is 48KHz. Reducing this can give some big wins in performance because less data is being produced. This setting is a trade off between performance and quality.
To control the source audio rate you can resample using your favorite audio editor or use the sample rate settings when compressing using the FSBank tool or the FSBankLib API. All audio will be sent to a resampler when it is played at runtime, if the source sample rate and the System rate match then the resampler can be essentially skipped saving CPU time. Be aware that this will only happen if there are no pitch / frequency settings applied to the Channel, so this trick is often good for music.
DSP Block Size
As mentioned earlier this represents a fixed amount of samples that are produced regularly to be sent to the speakers. When producing each block of samples there is a fixed amount of overhead, so making the block size larger reduces the overall CPU cost. You can control this setting with System::setDSPBufferSize(blockLength, ...), which often defaults to 512 or 1024 samples depending on the platform.
The trade off with this setting is CPU against mixer granularity, for more information about the implications of changing this setting please consult the API reference for that function.
Channel Count
Controlling how many channels of audio are being played can have a big impact on performance, consider the simple math that 7.1 surround has four times as much data to process compared with stereo. There are a few different places where channel count can be controlled to improve performance.
The source sound channel count should be carefully chosen, often mono sources are best, especially for sound that will be positioned in 3D. Reducing the channel count at the source is an easy win and will also decrease the decoding time for that sound.
Setting the System channel count will control how 3D sounds are panned when they are given a position in the world. You set this channel count by specifying a speaker mode that represents a well known speaker configuration such as 7.1 surround or stereo. To do this use System::setSoftwareFormat(..., speakerMode, ...), the default will match your output device settings.
As a more advanced setting you can limit the number of channels produced by a sub-mix or the number of channels entering a particular DSP effect. This can be especially useful for limiting the channels into an expensive DSP effect. The API to control this is DSP::setChannelFormat(..., speakerMode), by default this will be the output of the previous DSP unit.
DSP Choice
Not all DSPs are created equal, some are computationally simple and use very little CPU, others can be quite expensive. When deciding to use a particular effect it is important to profile on the target hardware to fully understand the CPU implications.
Positioning of the DSP can make a big difference, placing the effect on every voice could cost a lot of CPU time. There are no strict rules for where each effect should be positioned but to give an example, often low and high pass DSP effects can be used per voice efficiently, but reverb will often only have one instance with all voices sending to a sub-mix.
Wrapping Up
Hopefully now you have a good understanding of the options available for optimizing your usage of FMOD. If in doubt about your particular set up, please contact [email protected], we are more than happy to discuss your specific requirements.