DirectX Audio Programming

Zamrony P. Juhara

What I want to discuss with you this time is basic information about DirectX Audio including: DirectX Audio architecture, most frequent used terminologies and of course, the most interesting part, how to use DirectX Audio to play audio file.

Why DirectX Audio?

Windows had multimedia feature through Multimedia Control Interface (MCI), Wave API and MIDI API. For simple applications, MCI is capable to be used for playing WAV file or MIDI.

But, if you need to develop rich-feature sound recording application, such as Cakewalk's products, or game with impressive sound quality, MCI is not suitable.

DirectX Audio was presented by Microsoft for high performance audio application development on Windows platform. With DirectX Audio, you are able to do following things with ease:

  • Playback of WAV, MIDI or SGT file format.
  • Playback of two or more samples, simultaneously.
  • Changing pitch, tempo, reverb and applying sound effect.
  • Recording sound.
  • Playing 3D sound.

DirectX Audio Architecture

DirectX Audio consists of two components, i.e DirectSound and DirectMusic. DirectSound, which was created first, was designed for fast and efficient audio hardware access. DirectSound provides low-level mechanism for direct access to audio hardware. DirectMusic, which was created later after DirectSound, provides low-level programming interface for MIDI device, called DirectMusic Core and high level programming interface for loading and playback of music media, called DirectMusic Performance.

On DirectX 7, DirectMusic and DirectSound were two separated components (see Figure 1). MIDI and style-based music was loaded by DirectMusic Performance and then it was sent to DirectMusic Core which handled MIDI device and synthesizer directly. Sound in wave file format was loaded into memory and prepared by application and then sent to DirectSound which handle wave hardware.

DirectMusic and DirectSound Architecture on DirectX 7

Figure 1. DirectMusic and DirectSound architecture on DirectX 7 or lower.

After DirectMusic release, application developers prefered DirectMusic Performance to handle loading and music file playback. On DirectX 8, DirectMusic and DirectSound architecture was united, so DirectMusic Performance was used to handle loading and playback of MIDI and wave format. Unification of DirectMusic and DirectSound was called DirectX Audio (Figure 2). This architecture is not change in DirectX 9.
DirectMusic and DirectSound architecture in DirectX 8 and 9
Figure 2. DirectMusic and DirectSound architecture on DirectX 8 and 9.

In DirectX 8, Microsoft added new functionalities to Performance object, enabled it to load wave file and play wave format data stored in segment. Significant change to the architecture is the introduction of audiopath concept which separates DirectMusic Performance from DirectMusic Core. An audiopath is virtual audio channel which controls sound data flow. DirectMusic Performance manages one or more audiopath.

Performance

Performance is the work-horse of DirectX Audio. Performance is responsible for audio initialization, scheduling segments, creating and mapping audiopath to segment also controlling sound playback.

Loader

Loader was created to handle file input/output task and to load audio data from memory or resource. Loader simplifies programmer's task to load audio data.

Segment

Segment represents any playable audio data. It can be in form of MIDI file or wave file or DirectMusic Segment (default format of DirectMusic with SGT file extension). You can create one or more segments in an application. You are even allowed to play two or more segments simultaneously.

AudioPath

AudioPath is DirectMusic object responsible to manage route that must be taken by music instrument or sound effect. Each segment is played through an audiopath. Applications can have one or more audiopaths.


Creating Performance and Loader

DirectMusic do not provide function helper to create performance and loader instance. You must use CoCreateInstance() to create performance and loader instance (see Listing 1).

Listing 1

  CoCreateInstance(
    CLSID_DirectMusicPerformance,
    nil,
    CLSCTX_INPROC,
    IID_IDirectMusicPerformance8,
    FPerformance);

  CoCreateInstance(
    CLSID_DirectMusicLoader,
    nil,
    CLSCTX_INPROC,
    IID_IDirectMusicLoader8,
    FLoader);
If codes in Listing 1 executed successfully, FPerformance will hold pointer to IDirectMusicPerformance8 interface instance, while FLoader holds IDirectMusicLoader8 instance.

Before you call CoCreateInstance(), you need to initialize COM by calling CoInitialize() or CoInitializeEx() (see Listing 2). After finished with COM, you need to call CoUnitialize(). In Delphi, COM functions are declared in ActiveX.pas unit.

Listing 2

unit ..;
interface
...
implementation
uses ...,ActiveX,..;
...
initialization
  CoInitialize(nil);
finalization
  CoUninitialize;
end.
Personally, I prefer doing COM initialization in unit initialization so COM initialization function automatically get called everytime unit get referenced.

Audio Initialization

After you are done with COM initialization and performance creation, you need to call InitAudio(), member of IDirectMusicPerformance8, to initialize audio (Listing 3). InitAudio() need only be called once.

Listing 3

function InitAudio (ppDirectMusic: PIDirectMusic; 
ppDirectSound: PIDirectSound;
hWnd: hWnd;
dwDefaultPathType,
dwPChannelCount,
dwFlags: DWORD;
pParams: PDMUS_AudioParams) : HResult; stdcall;
Listing 4 contains example how to call InitAudio.

Listing 4

FPerformance.InitAudio(nil,
    nil,
    WindowHandle,
    DMUS_APATH_SHARED_STEREOPLUSREVERB,
    64,
    DMUS_AUDIOF_ALL,
    nil);

Ok, let us discuss InitAudio function parameters. ppDirectMusic and ppDirectSound, respectively, are instance of DirectMusic and DirectSound. If you want DirectMusic instance or DirectSound instance created automatically, fill it with nil. hWnd is window handle to create IDirectSound instance. If ppDirectSound contains valid DirectSound instance, hWnd is ignored. dwDefaultPathType parameter holds audiopath type as in Table 1. If you don't need default audiopath, it can be set to zero.

Table 1. Audiopath Type

ValueDescription
DMUS_APATH_DYNAMIC_3D Audiopath for 3D sound
DMUS_APATH_DYNAMIC_MONO Mono audiopath
DMUS_APATH_DYNAMIC_STEREO Stereo audiopath
DMUS_APATH_SHARED_STEREOPLUSREVERB Stereo and reverb

dwPChannelCount is number of performance channel needed. Each channel has its own volume and balance setting. dwFlags determines what features needed. Table 2 lists available flags that you can use. There are other flags but they are not implemented yet.

Table 2. Feature Flag

FlagDescription
DMUS_AUDIOF_3D 3D buffer
DMUS_AUDIOF_ALL All features
DMUS_AUDIOF_BUFFERS Multiple buffers
DMUS_AUDIOF_STREAMING Support waveform streaming

pParams holds address to D3DMUS_AUDIOPARAMS data structure. If it is nil, default audio parameters will be used. To comply with Delphi naming convention, it is also declared as TDMus_AudioParams. This record has following fields.

  •  dwSize holds structure data size. It must be set to sizeof(TDMus_AudioParams).
  •  fInitNow. It is a boolean. If it is set to true, synthesizer will also be created.
  •  dwValidData, DWORD type, holds flag which determines what data is valid. Table 3 lists available flag.
  •  dwFeatures, needed feature. See Table 2 for available flag.
  •  dwVoices, number of voices, default value is 64.
  •  dwSampleRate, sample rate used. Valid value is range from 11025 Hz until 96000 kHz. If it is not set, default value is 22050 Hz.
  •  clsidDefaultSynth, default synthesizer used.

Table 3. Flag for dwValidData

Flag Description
DMUS_AUDIOPARAMS_FEATURES dwFeatures holds valid data
DMUS_AUDIOPARAMS_VOICES
dwVoices contains valid data
DMUS_AUDIOPARAMS_SAMPLERATE dwSampleRate contains valid data
DMUS_AUDIOPARAMS_DEFAULTSYNTH clsidDefaultSynth is valid. If this flag is not set, Microsoft software synthesizer will be used

Loading Audio File

To load MIDI, WAV or segment (SGT) file, you use loader object with LoadObjectFromFile() (Listing 5).

Listing 5

function LoadObjectFromFile(const rguidClassID: TGUID; 
const iidInterfaceID: TGUID;
pwzFilePath: PWideChar;
out ppObject): HResult; stdcall;

rquidClassID is object class ID to be loaded. To load audio file to segment, we use CLSID_DirectMusicSegment.

iidInterfaceID, interface identifier. You can set it with IID_IDirectMusicSegment8 to create segment (IDirectMusicSegment8). With Delphi, you can use interface name to replace interface identifier. So replacing IID_IDirectMusicSegment8 with IDirectMusicSegment8 is allowed.

pwzFilePath is filename that you want to load. It is in widestring format (2 bytes per character). If you use ANSI string, make sure you convert string to widestring. After you converted to widetring, typecast it to PWidechar.

ppObject is variable that will hold address of object instance created. For our case, the object is IDirectMusicSegment8 instance. Listing 6 shows example how to load WAV file into segment named FSound1.

Listing 6

procedure LoadAudioFile;
var afilename:widestring;
apath:string;
begin
apath:=ExtractFilePath(
application.ExeName)+'sound';
afilename:=apath+'explosion.wav';
FLoader.LoadObjectFromFile(
CLSID_DirectMusicSegment,
IID_IDirectMusicSegment8,
PWideChar(afilename),
FSound1);
end;

LoadObjectFromFile, as its name shows, is only able to load data from file. So how to load data from memory? For this case, you need GetObject() (Listing 7).

Listing 7

function GetObject (const pDesc: TDMus_ObjectDesc; 
const riid : TGUID;
out ppv) : HResult; stdcall;

pDesc is description of the object to be loaded. We will discuss TDMus_ObjectDesc type soon. riid is interface identifier. ppv contains variable that will hold pointer to object instance.

TDMus_ObjectDesc

This data type describes object to be loaded. It contains following fields:

  • dwSize, data structure size. It must be set to sizeof(TDMus_ObjectDesc).
  • dwValidData, flag which determines what fields contain valid data. Table 4 lists available flag.
  • guidObject, object GUID.
  • guidClass, object class identifier.
  • ftDate, date when object was modified.
  • vVersion, version information.
  • wszName, object name.
  • wszCategory, object category.
  • wszFilename, name of file to be loaded.
  • llMemLength, buffer size of pbMemData.
  • pbMemData, data buffer.
  • pStream, address of IStream instance.

Tabel 4. Flag TDMus_ObjectDesc

Flag Description
DMUS_OBJ_CATEGORY Field wszCategory contains valid data
DMUS_OBJ_CLASS guidClass contains valid data
DMUS_OBJ_OBJECT quidObject contains valid data
DMUS_OBJ_DATE ftDate contains valid data
DMUS_OBJ_VERSION vVersion contains valid data
DMUS_OBJ_NAME wszName contains valid data
DMUS_OBJ_FILENAME wszFilename contains valid data
DMUS_OBJ_FULLPATH wszFilename contains valid data and with full path
DMUS_OBJ_MEMORY llMemLength and pbMemData berisi data valid
DMUS_OBJ_STREAM pStream contains valid data

Listing 8 contains example how to load audio data from stream. I usually initialize data structure with ZeroMemory() before filling fields.

To load data from memory into segment, you need address of buffer and size of buffer. You also need to set guidClass field to inform DirectMusic to create segment instance. Therefore, at least you need to use DMUS_OBJ_CLASS and DMUS_OBJ_MEMORY.

Listing 8

procedure LoadFromStream(Stream: TStream);
var objdesc:TDMus_ObjectDesc;
begin
if FInternalStream=nil then
FInternalStream:=TMemoryStream.Create;

FInternalStream.CopyFrom(Stream,0);

ZeroMemory(@objDesc,
sizeof(TDMus_ObjectDesc));
objdesc.dwSize:=sizeof(TDMus_ObjectDesc);
objdesc.dwValidData:=DMUS_OBJ_CLASS or
DMUS_OBJ_MEMORY;
objdesc.guidClass:=CLSID_DirectMusicSegment;
objdesc.llMemLength:=FInternalStream.Size;
objdesc.pbMemData:=FInternalStream.Memory;

FLoader.GetObject(objdesc,
IID_IDirectMusicSegment8,
FSegment);
end;

Loader caches data in buffer. Buffer containing data cannot be freed until loader is freed. This is because loader might need to access data in the buffer anytime due to cache mechanism.

If you use DMUS_OBJ_FILENAME or DMUS_OBJ_FULLPATH, GetObject does same thing as LoadObjectFromFile().

Download Segment into Performance

Before segment can be played, it need to be downloaded into performance or audiopath. Downloading segment means copying instrument data and waveform used by segment to performance. To download segment we use Download() of IDirectMusicSegment8 interface (Listing 9).

Listing 9

function Download (pAudioPath: IUnknown) : HResult; stdcall;

Code in Listing 10 downloads segment named FSound1 into performance.

Listing 10

FSound1.Download(FPerformance);

Playing sound

To play sound stored in segment, you can use PlaySegment() or PlaySegmentEx() (see Listing 11). First function is available in IDirectMusicPerformance and IDirectMusicPerformance8 while the latter is only available in IDirectMusicPerformance8 interface. PlaySegmentEx() is extension of PlaySegment().

Listing 11

function PlaySegment (pSegment: IDirectMusicSegment; 
dwFlags: DWORD;
i64StartTime: LongLong;
ppSegmentState: PIDirectMusicSegmentState) : HResult; stdcall;
function PlaySegmentEx (pSource: IUnknown;
pwzSegmentName: PWChar;
pTransition: IUnknown;
dwFlags: DWORD;
i64StartTime: int64;
out ppSegmentState: IDirectMusicSegmentState;
pFrom, pAudioPath: IUnknown) : HResult; stdcall;

pSegment and pSource are segment object to be played. pwzSegmentName is segment name. For now, it is not used, so must be set to nil. dwFlags holds flags which determine how segment is played. Available flags you can use is listed on Table 5. There are quite many of them but in my opinion they are not yet relevant because you are still learning basic thing.

Table 5. Flag PlaySegment

Flag
Description
DMUS_SEGF_REFTIME Time use REFERENCE_TIME unit
DMUS_SEGF_SECONDARY Secodary segment
DMUS_SEGF_QUEUE Segment is played after main segment is finished. If you use secondary segment, then segment is played after segment in pFrom
DMUS_SEGF_CONTROL Segment is played as control segment

i64StartTime is position where sound playback starts. ppSegmentState is interface that will hold state of segment. pFrom holds segment stopped when segment in pSource is played, this can be set to nil. pAudioPath contains audiopath used to play segment. If it set to nil, default audiopath is used.

Code in Listing 12 plays segment named FSound1 from start of segment using deault audiopath. Segment state is stored in variable named state.

Listing 12

FPerformance.PlaySegmentEx(FSound1,
nil,nil,
0,0,
state,
nil,nil);

Code in Listing 13 produce output that is same as Listing 12.

Listing 13

FPerformance.PlaySegment(FSound1,
  0,0,nil);

You can try DM1.dpr and DM2.dpr demo available on CD/DVD.


Playing many sounds at the same time

If you try DM1.dpr demo or DM2.dpr demo, everytime you press a button, sound segment currently playing is stopped. You cannot play two or more segments simultaneously with technique explained above. It is only suitable for sound player application where music is played one by one.

In game application, many sounds can be played at same time. In war game, sound of cannons, guns and wounded soldiers must be able to be played simultaneously to create illusion of real war.

To be able to play many segments at same time, you need separate audiopath for each segment. In DM1.dpr demo and DM2.dpr demo, we only used one audiopath, i.e, default audiopath.

To create audiopath, you can use CreateStandardAudioPath() of IDirectMusicPerformance8 interface (Listing 14).

Listing 14

function CreateStandardAudioPath (dwType,      
dwPChannelCount: DWORD;        
fActivate: BOOL;       
out ppNewPath: IDirectMusicAudioPath) : HResult; stdcall;

dwType is audiopath type. See Table 1 to find out available flags. dwPChannelCount, number of channels that you want. fActivate is set to true to enable audiopath. If you set it to false, you can enable it anytime with Activate(), member of IDirectMusicAudioPath8. ppNewPath will hold pointer to audiopath instance. You can examine Listing 15 for example code.

Listing 15

FPerformance.CreateStandardAudioPath(
DMUS_APATH_SHARED_STEREOPLUSREVERB,
64,true,  FAudioPath1);

Then audiopath to be played is used when calling PlaySegmentEx(). Not only that, you also need to play it in secondary segment, i.e, with DMUS_SEGF_SECONDARY flag (Listing 16). Without this flag, segment will be played as primary segment. Only one primary segment can be played at a time. Therefore, new primary segment will replace primary segment currently playing.

Listing 16

FPerformance.PlaySegmentEx(FSound1,nil,nil,
DMUS_SEGF_SECONDARY,0,
state,nil, FAudioPath1);


You can try DM3.dpr demo. If you press cannon button, gun and bomb button repeatedly, you will feel you are in middle of war.

Looping sound playback

There are times when you need to play background music over and over. To set how many loops a segment is played, you change it with SetRepeats() of IDirectMusicSegment8 interface. Parameter of this function is number of repeat you want. If it is set to 0, segment will be played once without any repeats. If it is set to DMUS_SEG_REPEAT_INFINITE, segment will be played infinitely until explicitly stopped. You can study example of SetRepeats() call in Listing 17.

Listing 17

//repeat segment once
FSegment1.SetRepeats(1);
//loop continously
FSegment1.SetRepeats(DMUS_SEG_REPEAT_INFINITE);

Is segment playing?

To find out what segment is currently playing, you can use isPlaying(), member of IDirectMusicPerformance8 interface (Listing18).

Listing 18

function IsPlaying (pSegment: IDirectMusicSegment; 
pSegState: IDirectMusicSegmentState) : HResult; stdcall;

We can check by using segment or segment state. If pSegment is nil, then pSegState is used or otherwise, pSegState is nil then pSegment is used. If segment is currently playing, return value of this method is S_OK or S_FALSE if otherwise.


Stop Sound Playback

Segment that is currently playing can be stopped with Stop() or StopEx() (Listing 19). Both methods are member of performance object. StopEx() is extension of Stop() and it is available in IDirectMusicPerformance8 interface.

Listing 19

function Stop(pSegment: IDirectMusicSegment;
pSegmentState: IDirectMusicSegmentState;
       mtTime: TMusic_Time;
          dwFlags: DWORD) : HResult; stdcall;
function StopEx(pObjectToStop: IUnknown;        
i64StopTime: int64;
         dwFlags: DWORD) : HResult; stdcall;

pSegment is segment need to be stopped. If you set to nil, all segments currently playing will be stopped. pSegmentState is segment state to be stopped. pObjectToStop is segment or segment state or audiopath to be stopped. mtTime and i64StopTime are stop time. If they are set to zero, segment is stopped immediately. dwFlags is flag which determines how segment is stopped. It can be set to zero or flag available in Table 5. Listing 20 contains sample code how to stop segment.

Listing 20

//stop all segment 
FPerformance.Stop(nil,nil,0,0);
//stop segment FSound1
FPerformance.StopEx(FSound1,0,0);

Pause Sound Playback

DirectX Audio does not provide special function to pause playback, however you can do similar task using StopEx() and PlaySegmentEx().

Before you stop playback of segment, you need to save current cursor position of segment. When you need to resume, saved cursor position is used to change start position of playback. You change start position of segment playback using SetStartPoint(), member of IDirectMusicPerformance8 interface. After you call SetStartPoint(), you play segment with PlaySegmentEx() as usual.


Close DirectX Audio

After you are done with DirectX Audio, you must call CloseDown(), member of performance object. This method does not require any parameters.

Before you call CloseDown(), it is recommended if you call Unload() to unload all segments previously downloaded into performance. Its parameters is same as Download(), which is audiopath or performance where segment downloaded.

Actually Unload() call is not mandatory when closing performance, because CloseDown() automatically unloading any segments. Listing 21 contains example code how to shutdown DirectX Audio.

Listing 21

FSound1.Unload(FPerformance);
FPerformance.CloseDown();
FPerformance:=nil;


Loader use cache mechanism to speed up playback process. Object that is created with GetObject() or LoadObjectFromFile() might refer other objects. A MIDI file might refer to other MIDI files. GetObject will automatically create referenced objects and add it to cache.

To free those objects completely, there are few things you must do.

  • Call ReleaseObjectByUnknown() for each object created with GetObject() or LoadObjectFromFile()
  • Call CollectGarbage().

If you don't use object that referencing other objects, you don't need to call ReleaseObjectByUnknown() and CollectGarbage(). For example, if you load WAV into segment, you don't need to do above steps because WAV don't reference external file. But this is good programming practice to ensure all resource we used is completely freed.

Listing 22

FLoader.ReleaseObjectByUnknown(FSound1);
FLoader.CollectGarbage();
FLoader:=nil;

Application Demo

You can get demo source code from CD/DVD accompanied the magazine. DM1.dpr demo contains demonstration how to initialize DirectX Audio and playback segment using PlaySegment(). Demo DM2.dpr is almost identical to DM1, but it uses PlaySegmentEx().

DM3.dpr demo is extension of DM2.dpr. In this application, we are able to play many sounds simultaneously. DM4.dpr is improvement of DM3.dpr with addition of MIDI background music that played over and over until application shutdown. Application user interface of demo is figured in Figure 3.
demo-aplikasi
Figure 3. Application user interface.

Source code of demo application is available for download here

Summary

You reach the last part of this article. I hope now you figure out basic techniques to utilize DirectX Audio inside your application. In this article, you have learn about DirectX Audio architecture, DirectX Audio initialization, loading and playing audio file using DirectX Audio and also how to play many sounds simultaneously.