Dynamic Music in Games: Building Procedural Music
A brief history of dynamic music in games, how it's used in modern titles, and examples of how to apply these methodologies in your own work
The purpose of this article is purely educational. Unless otherwise specified, I do not claim to own any otherwise copyrighted material.
What are we doing here?
In this article we’re going to be briefly exploring the vast history of Dynamic Music in games. I’ve included a lot of hyperlinks in this article to many of my main research points if you’d like to continue reading about it. In short, the core idea behind procedural music is pretty simple - you write a piece of music, break down its elements, and then play back those elements in different compositions.
I’ve written this assuming you (the reader) already have a basic understanding of music composition, if you don’t feel free to open this music dictionary in another tab to reference as needed.
Why have Dynamic Music at all?
When playing any game, one of the strongest emotional tone-setters is the music. Playing games with the music turned off causes wildly different play patterns. Studies actually show that playing games with music on causes you to play differently. Rich auditory functionality brings a high level of engagement to play, there’s no doubting that. If you’re playing any game to the DOOM (2016) soundtrack, your heart’s racing, blood’s pumping, and you’re locked in. But you’ll have a wildly different experience with Animal Crossing music in the background.
One of the core problems with game music is its constant repetition. In film and television, the director and composer are able to work together to ensure the listening experience is as comfortable as possible. Many games with longer playtimes from the early 2000s suffer from this. We’ve all been playing a title for too long, only to come to get annoyed when we hear the same track looping itself yet again, for the hundredth time. The Occam’s razor to this is to simply make our songs longer and make more of them.
In some cases this results in a mitigation of the repetition problem. One of my favorite examples is the OST of SimCity 4 (2004). With a total of 35 tracks and a total runtime of 3 hours and 15 minutes, it rivals other OSTs such as The Elder Scrolls V: Skyrim in length. Both of these OSTs aim to solve the issue of annoying repetition through sheer quantity. And to be fair, in my opinion they do. They also use methods of playing tracks in random orders and at different times to break up the repetitiveness.
But rather than simply ask how we might solve this issue of repetition, composers and designers have explored ways to leverage this problem as an opportunity to explore Dynamic Music. Games are dynamic pieces of media, the player is constantly changing what’s happening in their experience, so why should our music remain the same throughout it?
What actually is Dynamic Music?
In a paper written by Professor Elizabeth Medina-Gray from Ithaca College, titled Modular Structure and Function in Early 21st-Century Video Game Music, the history of modular and dynamic game music is extensively outlined. If you’re interested in an extremely deep dive, starting with musical dice games in the 18th century, coming up to modern day, I would highly suggest you give it a read.
The strongest take-away here is that modular music is not new. As mentioned above, we can see examples dating back to the 1700s. Throughout games which use a modular music setup, there are three denominator steps to implement the music, as defined in the paper as “The modular musical process”:
Creation of Modules & Definition of Rules
Procedural assembly of those modules constrained to the rules
Sounding of Music
What does this look like today? To give a simple and straightforward example correlating with the above steps:
We as the composer are told Mario will move through eight worlds, so we compose one song for each of them.
We also tell the designers that, whenever Mario enters a new world, to only play the song associated with that space.
Then in the game those tracks are triggered accordingly by gameplay.
In this case each step is simplified as much as possible. Our rules are similar, our procedural assemble is just “play this song” and the sounding of the music is the playback of that one song.
Earlier forms of modular music games involved writing a series of modules, usually the length of one measure each, which could be built into a larger composed whole of 16 measures. Each module was assigned a number on a chart, and a piece could be composed by rolling a set of dice.
Today similar methodologies are used in several forms of digital musical composition. Modern software allows users to purchase sample packs which follow a set of musical rules to ensure they fit together. This allows for amateur composers to build significantly larger pieces of music than they would otherwise be able to.
Extremely similar practices are used in games today. These can be broken down into Horizontal and Vertical structures, and most dynamic soundtracks have both.
Horizontal structures see modules being played one after another, either in looping or non-looping sequences.
Vertical structures see modules layered on top of one another, moving in and out of the mix simultaneously as things change in the game-space.
Our simple example of Mario moving through worlds and triggering one track after another is a Horizontal structure. If you imagine a timeline, each track comes after one another horizontally.
A classic example of a a Vertical structure can be seen in Portal 2. The music in the environment changes simultaneously based on the state of the game. These sounds are layered on top of one another vertically.
In the video above, if you listen closely you can hear three dynamic vertical elements:
Instrumental layer playing when you are near an activated laser
Instrumental layer playing when you launch the cube towards the door
Instrumental layer playing as the player flies through the air
Each of these elements is a rule that occurs throughout the game - different lasers play unique layered tracks, puzzle elements which signify solutions add unique elements to the audio mix, and the player’s movement state (both grounded and aerial) plays a contextual unique audio layer.
The most complex example of these dynamic structures can be found in both DOOM (2016) and DOOM Eternal (2020). In this case, the dynamic soundtrack can be considered both Horizontally and Vertically Modular. Rather than build a series of modules which could be played back to back, or only layer the musical elements vertically, composer Mick Gordon and the team at iD Software decided to take another approach: procedural song creation.
Content Warning for video below:
Extreme Violence, Body Horror, Gore, and Dismemberment
As the player plays through a combat encounter the modules change both vertically and horizontally. The quantity of enemies killed reflects the player’s progress through combat, allowing the music to horizontally move through its different phases. Notice how at the beginning it begins to pump up, by the middle the music is in full swing, and by the time there are only one or two enemies left, the song begins to trail out?
Whenever the player performs a glory of chainsaw kill, new instruments are brought in and out of the song, expanding and contrasting the mix. This provides a level of dynamism which many players won’t consciously notice. It serves to keep the gameplay fresh and moving, while keeping the player’s blood pumping.
While this vertical & horizontal structure is great, it doesn’t mean that it’s necessarily better. Hades (2020) uses an entirely horizontal structure and a changing time signature to dictate the pace of combat (There’s a fantastic explanation of this by YouTuber Jonathan Barouch).
Hades Composer Darren Korb switches between complex and simple time signatures to keep the player engaged. During earlier parts of combat, the time signature is typically abnormal. Usually this is a looping set of 4/4, 4/4, then 3/4 measures, which makes it extremely hard to follow. Once the second phase occurs, the time signature sticks to 4/4, creating a much more stable beat that the player can follow during the action, signifying the intensity of the phase. Listen to the song below and try tapping your foot, you’ll see what I mean.
The video below shows one of the tracks in question. For the first half of the song (before 2:55) the song is in an inconsistent time signature, making it hard to keep pace. Once the second half hits (after 2:55), you can feel a distinct shift as the action and pace pickup.
This is a great example of a creative use of an entirely horizontal structure. Vertical structures appear on paper to be the obvious and easiest solution for creating more dynamic and interactive gameplay soundtracks, but I think this is an amazing example of an ingenious methodology which works within constraints to influence the player’s emotions and gameplay intensity.
Examples of Dynamic Music in my own work,
and how you can build them too.
Here’s the part of the article where I show you how I failed (the good stuff), how I implemented some of these ideas into my own work, and some good tips to help smooth over the process for you.
Which audio tools should I use?
First off, let’s get the elephant out of the room and discuss Wwise and FMOD. They’re both great extensions for building dynamic audio systems, and can be super helpful. However, I found it to be far too difficult to hook these systems into Unity, then even more difficult to use them with version control systems such as Git. This was especially true when collaborators were working on different operating systems. I would do some extensive research to see if Wwise or FMOD is the right choice for you.
Personally, I programmed in all custom audio functionality so that I would have full control over it in the engine. However, Wwise and FMOD provide a suite of tools which are not built into Unity that professional audio engineers can take advantage of.
Overall, it’s really what you feel most comfortable using, and the circumstances of use.
Engine?
For these examples and prototypes I’ll be working in Unity and C#. However, the principles applied here are transferrable to other engines such as Godot or Unreal engine, as many of them are the same at their core.
Praetor’s Vertical Module Builder
Below is an example of an audio system I built in a very large prototype I worked on called Praetor. The game was a solo endeavor and aimed to improve on areas I felt were lacking in my previous project Rock Hopper. One of those areas specifically was dynamic music. Previously I had simply played a static looping “calm” track and “combat” track depending on what the player was doing. In Praetor I wanted to change things up. The video below showcases a dynamic audio system which exchanges six instrument tracks:
Guitar
Bass
Synth
Drums
Rhythm
Atmosphere
Every two measures of the song, if a kill has occurred since the start of the previous 2 measure block, each of the instruments in the mix changes to a different module. This makes for a high-octane action music structure, where you never hear the same four measures twice.
I found that the pacing of module changes helped to control the pace of combat drastically. Praetor ended up being a very small metroidvania. During the early levels, I would space out the music changes to as much as 8 measures to slow the pace of combat even if the music tempo remained fast. Once the player reached later levels, it was pertinent to lower it to a minimum of 2 measures.
This system was modeled after the dynamic music systems in DOOM (2016), I would highly recommend watching the GDC talk Mick Gordon gave on the subject.
How do I make this Procedural Song Builder?
So without FMOD or Wwise, how is this done technically in Unity? Well, I had tried lots of methods, but if you want to build something similar, here’s what you need to know.
Step 1: Build the song
First thing’s first - you’ll need to build a track. Obviously easier said than done, but if you’ve read this far and are this interested in the topic I’ll assume you have some basis of music production.
Some key points I’ve found when making this:
Limit your instruments. The more you have the more you’ll have to fix.
Keep your measures short.
Keep your phrases simple.
In the screenshot below I have a series of different instrument samples, each of which is two measures long. If you look carefully, you’ll notice there are three for each instrument. Three leads, bass, drums, rhythm, FX, and atmosphere. What you’ll want to do to be successful here is take one base element (in my case, it was the lead), and create three different 2 measure combinations of instruments surrounding it.
So for example, start with a lead, then make the bass, drums etc. This is your first combination. Lead 1, Bass 1, Drums 1, and so on. After you have all of these, mute all the instruments you’ve just added one by one, and record the same instrument with different phrases which fit the current combination, including your Lead. If you do this you’ll have multiple different 2 measure clips for each of your instruments.
Next, mute all your tracks. Unmute one of each instrument - how does it sound? In my experience this has always worked out, and I’ve been able to make pretty strong modules. Now that it’s complete, we can move into the implementation step.
Step 2: Exporting/Importing for Unity
Now there are no fancy tricks here. If you’ve worked with audio in Unity before you’ll know how difficult and temperamental some of their systems can be to interact with.
Because of this, this is very important:
When your song is complete, you must export each individual instrument track in the .ogg format. Make sure all clips are the exact same length of exactly 2 measures. (or how many measures you want to base your modules in)
Unity’s audio playback for all other audio formats (including mp3, mp4, wav, etc) causes a moment of lag regardless of how assets are stored in memory (streamed or otherwise). Instant playback works best with .ogg audio files. To showcase what happens when you don’t, I’ve created the video below:
As you can see, there’s a level of unacceptable skipping, cutting, and dead space every 2 measures, which is a huge loss in quality. If you import your files in the .ogg format you should be able to fix this issue before it even arises.
Step 3: Audio Playback in Unity
There are lots of methods of doing this step, but in my experimentation there have been two majorly successful ones. But before we discuss those, let’s talk about what not to do.
Before anything else, I tried to setup one audio source for each track, data sets to hold each available clip, and then simply swap out the clips for each instrument. Unfortunately, even when using .ogg, loading sound files into memory and then triggering playback is not instant in Unity. This results in desync of track elements (we’ll cover more on timing later). So, what are the possible solutions?
The first is to setup two audio sources for each instrument of your song, and play one at a time. When we begin playback on one, the other chooses a new instrument. Once the measure completes, a new clip is chosen and the opposite source plays. This allows you to change your instrument clips in and out using minimal resources, but it sacrifices control.
The second (and my preferred method) is to setup one audio source per exported track. Now I know this might sound like a lot, and to be honest it is, but it allows you significantly more control over your sources than the method above. There are a few reasons I found more success with this; It is far easier to use the “Play on wake” parameter as it allows for a quick visualization of which tracks are active in the editor; You can more directly adjust the audio sources, add filters, and use FX on each individual object; And you don’t have to deal with nearly as much specific information handling as you would with the previous method. Serializing your information to give each track properties and FX begins to become a nightmare.
Step 4: Triggering Audio by Rules
This is the final step! Now you have to trigger audio by your own rules. In any case, you will want to trigger the next set of modules on beat rather than instantly. Now if you’re a programmer reading this, you’re probably already screaming at the idea of having to program in a custom utilization of Unity’s dspTime to create a beat tracker, but I assure you there is an easier way. Here’s how I do it:
Each audio clip in Unity has its precise length in seconds accessible as AudioClip.length,
and if you remember earlier, I told you to ensure all clips were the same length. From here all you’ll have to do is setup a coroutine which loops at the specified length in real-time seconds, and trigger audio changes at either the beginning or end of each clip, depending on your needs. As long as your elements are all the same length, your sources are playing audio pre-selected and pre-loaded, you should be able to set this up without a hitch (this is obviously easier said than done, but I hope my explanation was helpful).
By relying on the clip lengths themselves rather than attempting to program a custom beat-timer, I’ve managed to do some pretty neat effects. One good example of this is the door opening animations in Rock Hopper.
EOS-503’s Single Layer Vertical Music
EOS-503 is a game which released quite recently that I had the pleasure of working on as both a developer and composer. The game features a series of narrative events with important characters, each of which is from a different part of the world.
As I was in charge of the music, I wanted to ensure there was a level of interest and dynamism when you spoke to each of these people. To do so, I researched the styles of music from each region of the world which the characters were from. I built one baseline layer which played whenever you were in a narrative event, then I gave each narrative event NPC their own unique instrumental layer to be layered on top.
Here are a few examples of this audio in action:
Baseline track with no instrumentals added, played during the first narrative event
Baseline narrative event music track with Arbah’s unique instrumental layer
Baseline narrative event music track with Thandiwe’s unique instrumental layer
Baseline narrative event music track with Shiraishi’s unique instrumental layer
(my personal favorite of them all)
Chromatic Isle’s Dynamic Audio Puzzles
While working on my short audio-puzzle game, Chromatic Isle, I built a series of puzzles which relied on both horizontal and vertical modular song structures. There are five puzzles in the game, but for brevity I’ll be touching on three which I feel are most pertinent. I designed one puzzle themed around each season, then one finale. I made sure to keep the scope low enough so that I could complete the project in a weekend on my own. However, I’ve considered taking another look at the sound systems in this game to see if there are any other ideas I’d like to build.
Winter: Drum Machine (Vertical Structure)
For this puzzle I wanted to teach the player how to use a drum machine, while also making them listen for specific parts of the track in the space. The goal is to line your camera up with the floating objects in the space to create the symbols as seen on the door.
Throughout the entire space the metronome of the drum machine can be heard. The player can also go up and play with the drum machine. Many (if not all) playtesters spent time building their own little song before they moved on to try and solve the puzzle.
Once they begin to recognize that the symbols on the machine correlate to the symbols in the space, they start to relate the audio and visual patterns. When the player manages to find the perspective visual of the correlating symbol in the space, the audio of that symbol plays back. The player than has to recreate that vertical module of audio on the drum machine for each symbol in order to proceed.
This sounds like an extremely complex problem in writing, but if you watch the video below you may find it easier to digest. I’ve had to make countless modifications in testing to ensure the level was digestible enough for any player.
Summer: Day/Night Cycle (Horizontal Structure)
In the summer themed puzzle I wanted to focus on a combination of music and nature sounds. I also wanted to build a day/night cycle as that was something I had never properly worked with before.
The concept of this puzzle is simple, really. It is the only area of the game with a day/night cycle, and the only interactable elements in the space are small stone statues with static symbols and faces you can rotate. As the days go by, audio of nature and music moves in and out. Days are filled with birds and horns, while night is filled with crickets and strings.
Of all the puzzles in the game, this one has yielded the most epiphanic “Oh! I get it!” moments. It’s all so simple, all the pieces are right in front of the player. All it takes is one small leap to understand what to do.
This puzzle is a static horizontal module setup. While there are no changing elements, the loop does have elements moving in and out as a part of the puzzle. It’s a short example, but it shows that you can build interesting combinations without crazy dynamism. Realistically there’s very little dynamism in this setup. Even though the sounds are playing generatively rather than deterministically, you could accomplish a similar effect with a looping audio track.
Fall: Soundscape Machine (Vertical Structure)
Finally, we’ll discuss the Fall puzzle. This is my favorite of them all, as it functions as both a fun interactive sound toy and a difficult puzzle. The idea is that the player has a soundboard where they can mute and unmute musical and ambient audio. When audio is played, each source has a procedurally animated colored bar to represent its audio.
Here’s the puzzle: Inside each of the different ruined buildings there are bars which correlate in motion and color to the elements on the board, but some are grey or fallen on the ground. In order to solve the puzzle the player must find the right combinations by carefully listening to build the correct soundscapes.
Each of the individual rooms features a Horizontally looping structure where the audio is playing back in a loop without any change, but the board itself is a Vertically built set of modules. There’s a similar setup here with the drum machine puzzle, but each uses a different part of the player’s understanding of music. While the drum machine asks you to listen carefully to the timing, the soundboard asks you to listen to the details.
Final Thoughts
As a whole, dynamic music in games is an extremely wide and complex world of interaction. It asks a lot of the designer, but in my opinion it is more than worth the result. I hope this article has been informative, and I’m curious to ask my readers - are there any games in particular with either good or bad dynamic music that have stood out to you? I’d love to hear.