Text-to-Speech Tutorial

Microsoft Speech SDK

The Microsoft.com Speech website Microsoft Speech SDK SAPI 5.1

Text-to-Speech Tutorial

This tutorial covers a very basic text-to-speech (TTS) example. The console application is one of the simplest demonstrations of speech. It is the "Hello World" equivalent for TTS. An equivalent sample for a Windows application using a graphical interface (and event pump) is available in Using Events with TTS.

The sample builds up from the simplest (though nonfunctional) COM framework to speaking a sentence. Steps are provided for each new function. The sample even goes one step beyond demonstrating the use XML tags to modify speech. The Complete Sample Application is at the bottom of the page.

Step 1: Setting Up The Project
Step 2: Initialize COM
Step 3: Setting Up Voices
Step 4: Speak!
Step 5: Modifying Speech

Step 1: Setting up the project

While it is possible to write an application from scratch, it is easier to start from an existing project. In this case, use Visual Studio's application wizard to create a Win32 console application. Choose "Hello, world" as the sample when asked during the wizard set up. After generating it, open the STDAfx.h file and paste the following code after "#include <stdio.h>" but before the "#endif" statement. This sets up the additional dependencies SAPI requires.

#define _ATL_APARTMENT_THREADED

#include <atlbase.h>
//You may derive a class from CComModule and use it if you want to override something, 
//but do not change the name of _Module
extern CComModule _Module;
#include <atlcom.h>

Code Listing 1

Next add the paths to SAPI.h and SAPI.lib files. The paths shown are for a standard SAPI SDK install. If the compiler is unable to locate either file, or if a nonstandard install was performed, use the new path to the files. Change the project settings to reflect the paths. Using the Project->Settings. menu item, set the SAPI.h path. Click the C/C++ tab and select Preprocessor from the Category drop-down list. Enter the following in the "Additional include directories": C:\Program Files\Microsoft Speech SDK 5.1\Include.

To set the SAPI.lib path:

  1. Select the Link tab from the Same Settings dialog box.
  2. Choose Input from the Category drop-down list.
  3. Add the following path to the "Additional library path":
    C:\Program Files\Microsoft Speech SDK 5.1\Lib\i386.
  4. Also add "sapi.lib" to the "Object/library modules" line. Be sure that the name is separated by a space.

Step 2: Initialize COM

SAPI is a COM-based application, and COM must be initialized both before use and during the time SAPI is active. In most cases, this is for the lifetime of the host application. The following code (from Listing 2) initializes COM. Of course, the application does not do anything beyond initialization, but it does ensure that COM is successfully started.

#include <stdafx.h>
#include <sapi.h>

int main(int argc, char* argv[])
{
    if (FAILED(::CoInitialize(NULL)))
        return FALSE;

    ::CoUninitialize();
    return TRUE;
}

Code Listing 2

Step 3: Setting up voices

Once COM is running, the next step is to create the voice. A voice is simply a COM object. Additionally, SAPI uses intelligent defaults. During initialization of the object, SAPI assigns most values automatically so that the object may be used immediately afterward. This represents an important improvement from earlier versions. The defaults are retrieved from Speech properties in Control Panel and include such information as the voice (if more than one is available on your system), and the language (English, Japanese, etc.). While some defaults are obvious, others are not (speaking rate, pitch, etc.). Nevertheless, all defaults may be changed either programmatically or in Speech properties in Control Panel.

Setting the pVoice pointer to NULL is not required but is useful for checking errors; this ensures an invalid pointer is not reused, or as a reminder that the pointer has already been allocated or deallocated

#include <stdafx.h>
#include <sapi.h>

int main(int argc, char* argv[])
{
    ISpVoice * pVoice = NULL;

    if (FAILED(::CoInitialize(NULL)))
        return FALSE;

    HRESULT hr = CoCreateInstance(CLSID_SpVoice, NULL, CLSCTX_ALL, IID_ISpVoice, (void **)&pVoice;);
    if( SUCCEEDED( hr ) )
    {
        pVoice->Release();
        pVoice = NULL;
    }

    ::CoUninitialize();
    return TRUE;
}

Code Listing 3. Bold text represents new code for this example.

Step 4: Speak!

The actual speaking of the phrase is an equally simple task: one line calling the Speak function. When the instance of the voice is no longer needed, you can release the object.

#include <stdafx.h>
#include <sapi.h>

int main(int argc, char* argv[])
{
    ISpVoice * pVoice = NULL;

    if (FAILED(::CoInitialize(NULL)))
        return FALSE;

    HRESULT hr = CoCreateInstance(CLSID_SpVoice, NULL, CLSCTX_ALL, IID_ISpVoice, (void **)&pVoice;);
    if( SUCCEEDED( hr ) )
    {
        hr = pVoice->Speak(L"Hello world", 0, NULL);
        pVoice->Release();
        pVoice = NULL;
    }

    ::CoUninitialize();
    return TRUE;
}

Code Listing 4. Bold text represents new code for this example.

Step 5: Modifying Speech

Voices may be modified using a variety of methods. The most direct way is to apply XML commands directly to the stream. The commands are outlined in XML Schema. In this case, a relative rating of 10 will lower the pitch of the voice.

#include <stdafx.h>
#include <sapi.h>

int main(int argc, char* argv[])
{
    ISpVoice * pVoice = NULL;

    if (FAILED(::CoInitialize(NULL)))
        return FALSE;

    HRESULT hr = CoCreateInstance(CLSID_SpVoice, NULL, CLSCTX_ALL, IID_ISpVoice, (void **)&pVoice;);
    if( SUCCEEDED( hr ) )
    {
        hr = pVoice->Speak(L"Hello world", 0, NULL);

        // Change pitch
        hr = pVoice->Speak(L"This sounds normal <pitch middle = '-10'/> but the pitch drops half way through", SPF_IS_XML, NULL );
        pVoice->Release();
        pVoice = NULL;
    }
    ::CoUninitialize();
    return TRUE;
}

Code Listing 5. Bold text represents new code for this example. This is the complete code sample.