New Project: A Procedural Music API for Games

Hi there folks,

I’m an Open University student in the UK, and after too many years of study, I’m working on the final project for my degree. I’m aiming to develop an API and class library to make it easier for people to use procedurally generated, adaptive or generative music in games. I’m going to be implementing it in Java, and I’m just beginning the design process.

I’m here to try to do a bit of research with potential users of the system - this is an academic project, but ideally I’d like it to be something that’s actually useful for people, that I might be able to Open Source and release for use at some point in the future. In my searches, I’ve come across this thread, which suggests that there’s a demand for this sort of library, but it doesn’t look as if a library or API has come out of that project (correct me if I’m wrong?)

Rather than reinvent my existing description of what I’m working on, this is my problem statement from my project so far (apologies for the more formal than normal style, and references that don’t go anywhere from here):

[quote]Modern computer games are fully multimedia experiences, featuring high quality graphics, sound effects and soundtracks. Traditional music composition techniques can cause problems when used in game soundtracks for a variety of reasons, such as being unable to predict how long a player will be in a particular scene, or the music composed for a scene not matching the events in game at a particular time (Berndt et al., 2012). Procedural Music generation has been used since the 1980s to alleviate some of these issues (Collins, 2009). A variety of algorithms and techniques have been developed which allow music to be varied within given parameters, or generated and adapted based on events in the game (Berndt et al., 2012; Brown & Kerr 2009).

Despite the advantages which procedural music offers in games, it is not used as often as may be expected. Collins (2009, p12) states that “games often do not have budgets or scheduling to allow for the time that it takes to instigate complex control logics”, while Plans & Morelli (2012, p197) said “the state of the art still relies on engines such as iMuse, technology from 1991”. They go on to point out that the libpd composition environment can be embedded within programming languages. This method, however, still requires music and adaptive algorithms to be developed in a separate programming environment, to then be linked to through the game source code using libpd as a bridge (Brinkmann et al, 2011).

The proposed solution is to design and implement a Java class library to allow a programmer to define parameters within which a game soundtrack can be generated. It will then give the programmer the facility to vary the music, in ways determined by a set of defined procedural music algorithms. These actions should all be possible by use of an Application Program Interface (API), without detailed knowledge of the method or logic used by the algorithms.

Definition of the parameters to be used, the code to control playback, and execution of the algorithms should all be possible within the game code being written, with no need for external tools. It should also be possible to define new algorithms to be used by the library using only Java code.
The aim of the project is to simplify the use of procedural music techniques and to allow them to be used by game developers without expending large quantities of time developing and implementing their own code to allow this, and without requiring them to learn to use another programming language or environment specifically for the task. Ease of use and adaptability should be the primary concerns of the interface.

To deliver this aim, the code library will be accompanied by documentation. This will take the form of documentation of the code in the form of Javadoc files, as well as a Report to Stakeholders which will provide descriptions and examples of how the code should be used.

The scope of the project shall be limited to defining and implementing the code libraries and interfaces to be used, and the documentation of these. While it may be possible, and necessary, to define some algorithms within the library to demonstrate its use, the focus of the project will not be developing new, complex algorithms to generate music.
[/quote]
I’ve got some use cases and requirements that I’ve drawn up myself, but rather than state these here I’d like to keep things fairly open to begin with to see what sorts of answers I get to see if I’m on the right track.

Things I’d like feedback on if, people can spare the time to answer (feel free to answer any, all or none of the questions - I’m more than happy to have general conversation around the subject as well if people don’t have specific feedback on these points):

  • Have you made use of procedural/generative music in your games? How did you accomplish this if you have?
  • If you haven’t, but would like to, what stops you from doing so?
  • Is a procedural music API something that you would like to have available for use in your games?
  • What features would be the ‘must have’ core functionality for it to be useful for you?

I would reiterate that my project is to define the framework and interface at this stage - implementing a collection of algorithms using Markov chains, genetic algorithms and statistical processing to generate chord sequences and melodies is beyond what I’ll have time to do. The project is meant to be in the region of 300 hours work, and the initial research etc has been around 100 of those so far. Having said that, see the intro paragraph above - I’d like this to be something that I can continue to work on. Once I have the API defined and project written up, starting to work on algorithms would be the next step to making it useful. Feedback in the form of ‘It’d need to allow use of xyz algorithm to generate a nice tune’ will be interpreted as ‘It would need to support the ability to define xyz algorithm’ for now. Hope that makes sense :slight_smile:

Thanks for reading, and thanks in advance to anyone who can spare the time to give me some ideas of what people would like to see :slight_smile:

tldr alert… :point:

Very cool project. I will be watching with great interest! Hopefully you will get some feedback from people who have actually tried to use generative music in their games. I don’t know that there are very many, though. The only ones I know of are professional teams that make use of FMOD’s music branching algorithms.

I have a music degree, compose classically, program Java, and have some definite opinions on this subject.

In my puzzle game, which kind of stalled out, I had a most primitive algorithm for the background music. Basically, the score consists of a rich drone and a windchime effect. Also, there are game-event driven “larger” (lower-pitched) bells that are similar in timbre to the windchime bells.

The windchime bells have a simple “game state” correlation: as the player gets closer to an answer, the chimes (which play randomly) play more frequently. The idea is to subliminally increase excitement and tension as the player approaches the final and most difficult part of the puzzle. No attempt is made to generate “melodies” or harmony changes. (This was my first effort along these lines.)

I’ve added a few more tools to expand these stochastic “sound fields” for use with background sound effects. But I haven’t built any demos, yet.

I’m currently working on an event system, and can now schedule sound events on an ongoing basis. But the first iteration is also very simple. It makes use of what I’m calling an “AdvanceEvent” command. The AdvanceEvent sets the time for the next “decision” point for scoring. At that time, game state can be consulted and scoring scheduling decisions can be made for that segment, including the schedule of the next AdvanceEvent.

My plan is to compose within “minimalist” parameters. In other words, use ostinati and motivic fragments, with a grading system that ranks the material. I’m also considering things like using reharmonization as a more programmable way to get harmonic changes rather than trying to implement a melody over a set of chord changes. Thus, a fragment that is played with an A in the bass that sounds minor (no flats or sharps) would sound more major (lydian mode, actually) if the bass note is changed to an F for a nice contrast. I’m also looking at “layer” pieces: compositions where various layers all work together in any combination but the layers are ranked via some parameter that corresponds to game state. The music playback algorithm would pick which layers to run based on game state.

I’m looking at Laban dance notation as an inspiration for bridging game state and music via parameters. You might consider giving this a look as possible input for designing your API. Rudolf Laban, in his choreography notation system, breaks movement down into effort qualities (force, directness, time), and the resulting scores for these qualities can be mapped to both limited musical material as well as player movement or game state, imho. I’m definitely thinking of something akin to this as an api starting point, assuming I get that far and am seeking to write api’s for scoring engines I have been composing in my mind while working on the programming aspect.

I think this type of scoring requires that the composer be able to think within certain musical structural constraints that lend themselves to algorithmic composition. Not every trained classical composer, and few self-taught or pop composers are used to thinking this way. They are more used to making pieces or songs, not designing a set of fragments that can work together in various configurations to achieve evocation of a certain time or place or emotion.

What first comes to my mind (in terms of compositional structures) are possibilities developed within the minimalist movement (e.g., Steve Reich influences–someone like film composer Thomas Newman might be good at this, I suspect, after viewing/hearing the credits for “Lemony Snickets A Series of Unfortunate Events” or the minimalist score for “American Beauty” ), or within a type of atonal scoring (e.g., using stochastic process, pitch classes, creating “Ligeti” type clusters or a sort of Xenakis pointalism and managing the degree of dissonance/tension) that is used primarily for horror-films. A “late romantic, early 20th century” style classical score (e.g., John Williams for example) would be very difficult to pull off on a generative basis, at this point, imho, and is out of reach except in the context of score-branching.

I think the programmer who is NOT a composer needs to be given an api that ranks things according to game state they can readily understand and quantify, and can’t be put in a position of having to make musical decisions, unless we are talking about simple score-branching decisions (if in this room play cue A, if/when moving to that room play cue B, if a fight starts, branch to fight music). A positive example might be monitoring health and strength scores during a fight and calculating a risk or danger value, and using that to drive the level of tension or dissonance in the playback. A programmer should be able to perform that calculation on an ongoing basis, and the composer/sound designer would have to be able to rank the playback material to match the desired levels of tension.

For low-budget games, as are made here for the most part, the score branching is approximated by just playing cues and not worrying so much about the continuity or creating smooth transitions between cues. You are not going to find a lot of “cinematic level” story-telling that calls for responsive, smoothly connected, continuous and dynamic underscoring for 2D games or even more advanced 3D projects unless they are already being programmed by a fairly large team.

There doesn’t tend to be a large dramatic range to most low budget games, so sound cue with SF/X pretty much is adequate to the task. Hence little demand for generative scores. But maybe if you build it (or I build it or we or someone else builds it) then game designers, programmers and composers will understand its value and start making new games that take advantage of this potential.

Cool idea! Some thoughts:

  • performance appears to be a big issue. For most games (except some turn-based ones) having a high and constant framerate is key to success. Any kind of stuttering or hiccups breaks the player’s immersion and makes the game basically fail. So procedurally generated music for games must be either fast, or generated beforehand (which would take away the possibility to adapt music to gameplay).
  • for procedural music to work it must fit the experience of the game at a specific point. Modern games sometimes have different tracks (e.g. a militaristic pumped-up themed one and a more relaxed one) and switch between the two for example based on whether combat is taking place. However, this is also a very hard problem to solve: how to determine whether the player is in a situation that suits a specific musical style? This requires some kind of categorisation or model of gameplay contexts (e.g. “battle”, “building”, “scary”, “winning”, “losing”) and a similar categorisation of procedurally generated musical styles. Both must be extremely formalized to be able to actually code it.

I’m definitely interested in procedural music generation. However, before I would use such a library I would need:

  • good and consistent performance
  • a simple model to select musical styles and a way to smoothly switch between them
  • of course nice and interesting music with settings that allow a great bit of variety

Thanks for the replies - some good points in both of them.

@philfrei - I’ve already identified in my head a few sets of competing priorities for the project. In terms of features list, there is definitely an ease of use vs supplies features/versatility axis that I’ll need to try to balance. I guess this is a similar issue to anyone writing an API for any purpose really. I’d like the API to be flexible enough to suit both people with a musical background, who want to write a piece and have the API modify it according to what happens in game, and non-musical people, who want it to generate cool music without having to give too much input. Doing that without making a messy, oddly or inconsistently defined API is going to be tricky.

There’s also a broad range of things that ‘procedural music’ as a term can cover - algorithmic composition to imitate a style, adaptive music dependant on game events, music strung together from various samples/sections (ala the procedural music in Spore), purely generative music, taking a theme and making variations on it to stop the loop from getting boring. In theory, given broad enough definitions in the interface, you could support all of that within one library by plugging in different algorithms, and use any combination of them in one piece… but at what point does supporting all of those options make it more difficult to use for people who don’t need them all?

The broader skillset side of things is something that’s come up in my research as a related issue - people are taught to compose music that will be played as a sequential thing, whether that’s for a concert or a soundtrack to something. Adaptive audio is a different skillset, even if it does use a lot of the same tools. It’s probably no coincidence that a lot of generative music has tended towards the more abstract, modernist end of things - I’d imagine there’s most crossover in both skills and interest from people who want to do that type of composition rather than big sweeping melodies. I’m guessing you’re aware that people have used algorithms to imitate composer’s styles (try a search for Mozart’s 42nd Symphony, similar has been done for Bach) but for the purposes of a game soundtrack, I’d expect you’re right about such things being out of reach. I don’t think big score like that is what anyone would come to expecting.

How are you handling your music playback? Is it via audio samples that are played/triggered, MIDI, something else? Are you using any existing libraries on the way?

@Grunnt - yep, performance is going to be a big thing. As with most systems there’s multiple parts to that - making sure the playback system itself isn’t too resource heavy, and how the generation and adaption methods behave are two of them. Music that ‘works’ does basically come down to numbers, and algorithmic generation comes down to choosing the right ones. That can be either very quick (give me a random sequence of 30 numbers and we’ll call it a tune!), or pretty damn slow (a long drawn out genetic algorithm over a lot of generations, testing each for best fit against some set of parameters). Then there could be factors that are out of the hands of the programmer to an extent, such as how any particular system handles the sound generation process - dedicated hardware, via the main processor etc, which could all have an impact too. At the moment, I’ll only really have control over the playback side of things - not being too resource heavy is one of my requirements already, but I still have to figure out a good, testable fit requirement for it.

This is definitely something I would use frequently, something I have not seen before, and (if it was easy enough for a non-composer to use) something I would pay for. Great choice of project! :slight_smile:

It sounds like you are have an excellent handle on the subject! The term “procedural music” is indeed very wide, covers a lot of ground. And true to form, I’ve tended to tunnel in on my own priorities rather than spend much time becoming conversant with all the big picture issues.

How are you handling your music playback? Is it via audio samples that are played/triggered, MIDI, something else? Are you using any existing libraries on the way?<<

I have written my own “audio system.” For basic playback, it is on a scale similar to the TinySound library, by Finn Kuusisto. We were working on parallel tracks, it seems. The only library my code relies on is javax.sound.sampled. It mixes all sounds down to a single output stream (via a SourceDataLine). I’ve written a crude procedure that imports MIDI, and makes use of javax.sound.midi library to a small extent, but all the playback is handled via my own code, not Java’s midi system code. The only thing I can import and translate are note on/off commands.

I hear there are more efficient implementations for sound playback available than javax.sound.sampled, and will look into that eventually.

Audio events can be triggered by a concurrent thread or “scheduled” to occur at an arbitrary sound frame number. The mixer references an event scheduler on a frame-by-frame basis. (44100 fps). I can also schedule notification messages, and thus trigger visual events via the sound score (though in Java, sound processing does not have real-time guarantees–processing runs a varying amount ahead of when the audio playback is actually heard).

I have created an “improved” Clip equivalent (allowing multiple concurrent playback, stereo panning, variable speed playback). The synthesizers I have created to date are mostly FM (dabbled a bit with wave-table, too), and can be used either to make Clips for playback or can be played directly. (Looking forward: resource manager to determine dynamically which option to use given current cpu/RAM tradeoffs.) With the FM synth, I’ve mostly been replicating patches from my 1980’s Yamaha DX7, e.g., using 6 operators. I’ve also made an assortment of effects: echo, chorus, flange, but no filters or reverbs yet.

The balancing of priorities is a huge task. Since I’m working on my own, I decided to focus on stepping stone projects and to make the system work for me, have it tailored to my own composing ideas and needs, rather than think too hard about creating a generalized system. That can come later, after proving that this actually works!

Current stepping stone project: a tuning tool, drone tool. The notes that are looped are few, but are dynamically generated and released, and make use of the “event system” (somewhat akin to midi) that I just wrote. It is a simple implementation of something that I anticipate I can build on to allow the organization of musical motives and fragments, down the line. The tool pretty much works, but I’m now dealing with the interface that allows the selection of the notes to be played, and whether they are tuned via equal temperament or via pythagorean ratios off of the designated root tone. So, that is a bit of a digression, but hopefully will result in a viable app.

My “plan” is to have a library of procedural Java FM synth-patches (both ambient and “classic” such as string synths, keyboards, basses, brass, bells) with controllable timbral parameters, and an ability to organize motives into “EventCommand” data arrays that are assigned to the different synths. Included should be an ability to assign MIDI tracks to the different patches as part of an import process, allowing me to compose the motives on my DAW, where I can work with the same FM patches that exist on both my DAW’s software synth (NativeInstruments FM7, with data imported and tweaked from my Yamaha DX7S) and are part of the procedural Java FM synth library. Haven’t thought it out much beyond that, except to note ideas on how some of my previous compositions could be rendered via procedural code, and think about different game genres and what could be done to set them that is within reach via these methods. (Example: brass fanfare writing as one element in d&d/fantasy score.)

For an efficient system that also hits the needed dramatic points and sounds decent, for now I think the emphasis is going to have to be where most of the creative work is on the composition end, and finding composers able to do this. Programmers can readily figure out algorithms for aspects such as suspense levels, violence/combat levels, game-flow levels, and produce [0…1] streams for these on an ongoing basis, as well as transition markers. But it’s the composers that need to figure out ways to provide material that matches the setting and dramatic needs of the game, AND allows ranking of possible playbacks via those parameters, and forms a satisfying aesthetic whole. The main formal compositional mechanisms I envision being “practical” are track combination (i.e., a cue is actually comprised by a looping set of tracks that can be played concurrently in different combinations, and this includes the concept of reharmonization by using different basses for same material), motive-concatenations of given lines (where motives are similar but graded), and the grading of overall playback characteristics such as tempo, volume, “brightness” (fm is terrific for efficiently altering timbres to be darker or brighter on the fly). Also key: branching between tracks or sets of tracks that form different pieces/settings.

For darker, more dissonant settings (using atonality), one can grade via controlling probabilistic densities and timbrel qualities of the material, sort of like a windchime that is more or less active, but the material is decidedly something other than just tinkly bells.

[quote=“EmbraCraig,post:4,topic:48206”]
Perhaps defining a max processor usage would work okay as a means to measure performance (e.g. max 25% CPU usage on my laptop during game play). Theres two sides to this: the amount of performance that is left over for the game (e.g. in this example 75% of CPU time), but more importantly, avoiding peaks or stuttering in processor usage.

Do you know how game loops work? One of my favourite articles on this subject is http://gafferongames.com/game-physics/fix-your-timestep/, but there’s plenty of other articles out there. Understanding this may help you define what is needed performance-wise for the generator: if peaks in processor usage make the game update code run very slowly then the player will experience this as stuttering.

Anyways, sounds like a very cool project, good luck with it!

Thanks again for the replies folks - I meant to get back to this thread earlier, but have been struggling on with some requirements and beginning design work.

I’m looking for a bit of feedback from anyone who might be a potentially interested user, or just interested onlooker - I’ve pulled out requirements from my use cases (the ordering of the use cases are the reason for the slightly haphazard numbering). I’d be interested to hear if anyone spots any gaping holes that they’d need as a user of the system…

[quote]FUNCTIONAL REQUIREMENTS
1.1 The interface shall allow a chord sequence to be defined
1.2 The interface shall allow instruments to be defined
1.3 The system shall allow a piece of music to be generated
1.4 The interface shall allow the instruction to be sent to generate a piece of music
1.5 The system shall provide a way to playback defined music
1.6 The interface shall provide a way for the music playback to be instructed

2.1 The interface shall allow a melody to be defined
2.2 The system shall allow variations of a melody to be generated
2.3 The interface shall allow the instruction to be sent to generate a melody
2.4 The interface shall allow a piece of music to be constructed from variations

3.1 The interface shall allow a piece of music to be defined
3.2 The interface shall allow relationships to be defined between music and external parameters
3.3 The system shall be able to vary music according to the relationships defined

4.1 The system shall multiple pieces of music to be defined
4.2 The system shall allow the piece of music being played to immediately be changed
4.3 The interface shall allow a message to be sent to instruct playback to be immediately switched to another piece of music
4.4 The system shall allow music to be returned to the original music when another message is received
4.5 The interface shall allow a message to be sent to instruct payback to revert to the original piece of music

5.1 The interface shall allow instrumentation to be varied during playback

6.1 The system shall allow multiple melodies to be defined for a piece of music
6.2 The interface shall allow additional melodies to be added to a piece of music after its creation

NON-FUNCTIONAL REQUIREMENTS
Usability Requirements
U1 – The system shall be easy to use
U2 – The system shall integrate with code in the same programming language easily
Performance Requirements
P1 – The system shall be able to handle pieces of music made up of multiple sections, with multiple musical parts
P2 – The system shall not use an undue amount of system resources for music playback
Operational Requirements
O1 – The system shall safely operate alongside game code without causing problems
O2 – The system shall be thread-safe
Maintainability Requirements
M1 – The system shall allow additional algorithms to be defined and used alongside the code library
Security Requirements
S1 – The system shall use data hiding and private code appropriately to enable a strong system boundary
[/quote]
This leads me on to start outlining design of classes etc:

[quote] The system needs to support algorithms which will:

-          Generate a melody from nothing
-          Generate a melody using an existing chord sequence as a base
-          Generate a chord sequence
-          Generate a drum track

Then began thinking about candidate classes - the biggest one being a piece of music. I’m going with Song as a class name as a fairly unambiguous term for something that is made up of the various parts involved, even if actual vocals aren’t going to be there. There is also going to be 2 different types of Songs needed - a general song that is simply defined once and played (possibly on a loop, or possibly as part of a playlist of songs), and a song that needs to be updated procedurally as time goes on.

A song has:
-          1 time signature
-          1 key
-          1 chord sequence (which could be made up of multiple sections)
-          0 to x harmony tracks
-          0 to x melody tracks
-          Up to 1 rhythm track

An updatable song also has:
-          An ‘update interval’ parameter (in beats or bars rather than time, probably)
-          A defined update method

It’s probably sensible for interaction with the songs and playback to be handled through a central controller/manager class - so moving on to some of the things this will need to handle:

The music manager has:
-          A collection of all of the pieces of music defined
-          A ‘countdown to update’ timer
-          A current piece of music
-          A collection of the algorithms available for use
-          A list of game parameters that have been defined

[/quote]

Is it your intent to omit a very important composing device: the use of layering? Accompanying music can be a composite of a subset of parallel tracks, and the tracks can be graded per some parameter, such as activity, tension, dissonance. Tracks can be added or deleted as per the dynamics of the game play.

This sort of composition is different from a construction via ‘variations’ (your 2.4). Variations are generally considered to occur end-on-end, as in a theme and variations work (Diabelli/Goldberg). The variations aren’t meant to be played in various combinations, at the same time.

Here’s a good example I ran into just this last week: from Season 7 of “Foyle’s War”, 1st episode, approximately 0:55:00 in. A simple theme is given, minor 3rd, major 2nd, major 7th (below root), root, played very slowly. This functions somewhat like a cantus firmus. Then over the course of the dramatic development, there are layers added that are similar in construction to species counterpoint, e.g., one note per cantus firmus note, four notes per cantus firmus note, etc., but also high, held (pedal) tones can come and go as layers. The layers are manipulated to build and recede in tension.

Thus, the composer creates a piece with a selection of counterpoints, the counterpoints almost being like a species series. It is a different sort of composing than is usually done, but well suited for dynamic game accompaniments, imho. (Just thought of another classic example: the main theme used in “The Exorcist” by Mike Oldfield “Tubular Bells”–he also made considerable use of layering as a compositional device.)

In the “Foyle’s War” example, there are also switches to and from this harmonic minor tonality to a dominant pedal, which provides contrast and relief. (Playing any thematic material over much causes it to recede in perception.) You sort of cover this in section 4, but I’d include the option of defining multiple sections within a piece, not just multiple pieces. Stravinsky was the master, here. The sections chosen would be graded.

I’m also wondering about transitions–are bridges nonexistent, prebuilt, or built on the fly? Can the length of the bridge be a parameter? Interesting problem, to me, in terms of how one “tweens” between two pieces. To explore this, I want to make a program that has two buttons: on one side we play Fur Elise, on the other Gradus ad Parnassum, and a slider to control the tweening length which bridges the two melodies, on the fly. This is a “stepping stone” project on my queue.

The layering would be covered under 5.1 and 6.1 and 2 in the requirements - defining multiple melodies that are defined (or generated), to then be added and removed during playback. This should be controllable within an update method of the updatable song, checking on parameters that are defined by the programmer and set within the music controller class (I could probably do with making this greater flexibility clearer by adding extensions to the use case and defining them in the requirements).

Transitions are something that I hadn’t given too much thought to at the moment. At the moment the requirement simply states that playback is switched from one song to another (which in my head is a simple cut for now), but something smoother would be preferable, obviously - an option to fade the existing song out before starting the next maybe, or an option to play a sting or bridge of some description.

Maybe something along the lines of:

MusicPlayer.startPlaying(landscapeSong);
MusicPlayer.changePlaying(battleSong);

as the default, but with options of:

[quote]MusicPlayer.changePlaying(battleSong, changeType.FADE);
[/quote]
or

MusicPlayer.changePlaying(battleSong, bridgeFanfare);

where the changeType is an enumerated set of options, or bridgeFanfare is the name of another defined song…

Thanks for pointing this out to me - another thing to think about, which is exactly what I was hoping for by asking here… What would you think of the sort of options above?

Sounds good, seems like a good start.

I also wonder if I should have held back for a couple of days before replying. I’m kind of intense on this subject, being more of a composer than a game writer and having given a lot of thought to this, and may have inhibited other game programmers that might have immediate needs or ideas. I apologize for that and hope others think about their sound/music wish lists and contribute to this thread, as well.

One of the most difficult aspects is figuring out how far to go with specs and options. For example, I think cross-fades may benefit from having some sort of length indication. In DAW’s, there is even the choice of slope. But for simple-level game usage, maybe a single, quickish standard AB roll is good enough, as long as it is volume-balanced.

You are designating the cues via a String identifier. I prefer using Interfaces, in general. Maybe there is an abstract Cue interface, with subclasses (SlicedCue, DicedCue, see below, but also subclasses for pre-recorded or generated, and of the pre-recorded, playback from memory vs playback from file).

I think that you might consider a CrossFadeable Interface. Then you can store the parameters for the fade with the cue. Also, it might be nice to track progress through a cue, and be able to pause when you exit and then pick up where you left off when you return, or backtrack to a “pickup” point rhythmically.

Or include some sort of spec for envelopes/fades in general. (Fadeable interface?)

This is all such a big topic–very ambitious. Sound doesn’t get as much attention as visuals, but I think some of the aspects are just as interesting as some classic AI problems, like finding a best melodic linking path between two generated melodies for a cross-over, compared to finding and generating a best path visually.

Another thought–distinctions may need to be made between pre-recorded cues and generated cues. For example, with generated cues, transitions can be aided by tempo shifts, whereas with prerecorded, one probably isn’t going to get into changing tempos on the fly (though I suppose it could be done–I’ve seen software that stretches out music without pitch-shifting, via granules, for example).

I was also thinking about issues pertaining to the loading of pre-recorded cues: there might be ways to load them as either “sliced” or “diced”. By that, I mean take a DAW and make multiple recordings, one for each layer (“slicing”) and upload the multiple as a “slice-cue”, or take the DAW and record short fragments (“dicing” every X measures or beats) and upload that set as a “diced-cue”. This is again more depth than one needs for a first iteration, but it is definitely on my wish list. In fact, I’d really like the DAW to be able to spit out, automatically, a set of wavs for a cue, one for each measure, but with all the “tails” intact (reverb tails, note tails), so that when doing a transition or halt, the last sounds heard plays out instead of getting truncated.

A couple nights ago, I was looking at my cue switching ideas and thinking about specifying the piece as an array of units (broken at possible/allowed cue switching points) but also designating things like down-beats (strong-beats) and pickups (weak-beats) to allow a smoother transitions via finding ways to keep the pulses between the cues coordinated for the switch over. Maybe the wrapper for a wav could also allow one to designate switch points, downbeats, pickups, etc. (or a hierarchy of weak/strong beats).