AudioCue, an alternative to Clip, for 2D games

[EDIT: now at github.com/philfrei/AudioCue. Some of the following post will be obsolete due to the addition of AudioMixer]

AudioCue was written for Java 2D game programmers. Its purpose is to make it easier to create soundscapes and to handle common sound needs/desires that arise in 2D games. The code consists of one largish class: AudioCue, and a supporting class and interface, both of which are used to support line listening. The classes are provided verbatim and can be pretty much dropped into your project. All the code is core Javaā€“no external libraries are used. The code is free and carries a BSD license.

For 2D game programmers, the main option for sound effects has been to use the core Java Clip. Clip has some distinct limitations:

I. Does not support concurrent playback
A Clip cannot be played concurrently with itself. The are two basic options if, for example, you want to have a rapid fire ā€œbang bang bangā€ where a new shot is triggered before the sound effect has finished playing:

  1. stop the cue midway and move the ā€œplay headā€ back to the beginning and restart the cue;
  2. create and manage many Clips of the same sound.

With AudioCue, it remains possible to start, reposition, and stop the ā€œplay headā€ just as you can with a Clip, but you can also fire-and-forget the cue as often as you like (up to a configured maximum) and the playing instances will all be mixed down to a single output line.

II. Spotty real-time volume fading
A Clip requires the use of a Control class for managing volume changes. The availability of these classes is somewhat machine dependent. Also, the changes only take affect between buffer loop iterations, leading to problems with discontinuity-generated clicks when the setting is asked to vary too quickly.

With AudioCue, there is no reliance on machine-specific Control lines. The changes are processed internally, on a per-sample-frame basis. The change requests are received asynchronously via a setter that lies outside the audio processing loop, and can take affect at any point during the buffer loop iteration. Also, a smoothing algorithm eliminates discontinuities by spreading out change requests over individual sample frames. The result is a fast, responsive, and smooth volume fade.

As a bonus, the same mechanism is used for real-time panning and pitch changes! (See the SlidersTest.jar below for an example.)

FROG POND (soundscape example)
The following jar is an example of a soundscape: frogpond.jar.
You can also hear what this jar does by listening to this wav file: frogpond.wav.

The jar generates a background audio that suggests many frogs croaking in a nearby pond. It runs for 15 seconds. Very little code, and a single asset, is all that is required to create a fairly complex, and ever-varying sound effect.

The jar includes source. Here is the most revelant code:


    URL url = this.getClass().getResource("res/frog.wav");  // 1
    AudioCue cue = AudioCue.makeStereoCue(url, 4);          // 2
    cue.open();                                             // 3   
    Thread.sleep(100);                                      // 4
	
    // Play for 15 seconds.
    long futureStop = System.currentTimeMillis() + 15_000;
	
    while (System.currentTimeMillis() < futureStop)
    {
        cue.play(0.3 + (Math.random() * 0.5),              // 5
            1 - Math.random() * 2,                         // 6 
            1.02 - Math.random() * 0.08,                   // 7
            0);                                            // 8
        Thread.sleep((int)(Math.random() * 750) + 50);     // 9
    }
	
    Thread.sleep(1000);
    cue.close();                                           // 10

  1. You assign a resource in the same way as is done with a Clip or SourceDataLine. Only URLā€™s are supported, though, as they can identify files packed within jars, unlike the File object.

  2. A static ā€œmakeā€ method is used, rather than a constructor. This was based on suggestions coming from Joshua Bloch in an article he wrote about APIā€™s (and a desire to make it easy to add, down the road, an additional method for a mono, delay-based panned sound effect). The number ā€œ4ā€ in the argument, here, is the maximum number of instances that can be played at one time.

  3. This code opens an underlying SourceDataLine for output, and starts sending blank audio. It is possible to change the size of the buffer, to select a specific Mixer output, or specify the thread priority at this stage by using a more complex form of the method:

    void open(javax.sound.sampled.Mixer mixer,
            int bufferFrames, int threadPriority)

A larger buffer size may be needed as higher numbers of concurrent instances are allowed, in order to prevent drop outs. But a larger buffer can also impact latency. Making this a configurable parameter allows you to balance these needs.

  1. Iā€™m not clear if this pause is truly needed, or how much of a pause. I mostly wanted to ensure that the open method has a chance to complete before executing play commands. [TODO: a detail to figure out]

  2. Here we set the volume parameter. Permitted range (clamped by AudioCue) is from 0 to 1. For soundscape purposes, the different volumes suggest different distances from the listener to the virtual frog that emits a given croak.

  3. Setting the panning. Permitted range (also clamped) is from -1 (100% left) to 1 (100% right) using a volume-based algorithm. Actually, there are currently three different volume based algorithms availableā€“best to consult the API about their specifics and how to select one.

  4. Setting the pitch. A slight amount of speed up or slow down of the playback rate helps give the impression that these are many different frogs, some larger, some smaller. The setting is used as a factor. The setting 0.5, for example, halves the playback speed, and 2 doubles it. The permitted values range from 0.125 to 8.

  5. This is the loop parameter. Since we only want each croak to play only once and end, we leave it set to zero.

  6. A random time interval is generated to space out the occurrence of the croaks.

  7. The close method releases the underlying SourceDataLine.

REAL-TIME FADING (example program)
The following jar holds a demonstration of real-time fading: SlidersTest.jar

In this program, three instances from a single AudioCue can be sounded at the same time, and sliders affect pitch, volume and panning for each individual instance, in real time. The bottom has a button that plays the same wav file as a Clip, for comparison. I invite you to compare clarity and latency.



In the source code, you will be able to see how the three instances are managed via int handles.

A couple considerations I should not omit:

One way in which Clip is superior to AudioCue is that it can handle a broad range of audio formats. AudioCue has only had the standard ā€œCD Qualityā€ format enabled.

Why didnā€™t I support more formats?

A couple of reasons. One is that I have yet to figure out a way to implement this that doesnā€™t impact the latency or add to processing demands or create considerable additional coding complexity. Another is that it is a straightforward task to use a tool such as Audacity to convert audio files to this format.

As for supporting compression, I didnā€™t want to involve any external libraries. But for those who do use compression, there is this path. When decompressing your asset, bring it to a state where the data is organized as stereo floats with values ranging from -1 to 1. This is a common form for PCM data. AudioCue can use a float array as a parameter in place of a URL.

I do not know if the libraries that decompress audio assets allow the option of leaving the data as a PCM floats array. AFAIK, Jorbis (Ogg/vorbis compression) converts audio bytes into PCM floats prior to the compression step. For my own purposes, I made a small modification to the source provided for Jorbis (Ogg/Vorbis compression) to intercept the generated buffer of decompressed floats before they are converted back to bytes and output via a SourceDataLine. There may already be a class or method that accomplishes this that is publically available.

I should point out there is another good option for makers of 2D games: the TinySound library. I havenā€™t taken the time to work out the comparisons. AudioCue is more about the capabilities of individual cues, TinySound mixes all cues down to a single output line, is easy to use, and there are lots of features like support for Ogg/Vorbis and concurrent playback. Definitely worth checking out as an option.


This code is to a large extent a way I hope to give back to Java-gaming.org, for all the help this community has provided me over the years! I hope it proves useful, especially for new game programmers trying out Java.

Iā€™ll do my best to answer questions and make corrections to the code, when the many typos and questionable design and coding choices are pointed out.

This is great! Now I wonā€™t have to write my own sound stuff. A problem with tinysound is that it DOES NOT support the adjustment of volume during playback, something that this supports!

@cygnus - Cool! Let me know if you have any questions or if I can help in any way.

I have already successfully implemented it in my game :slight_smile: Thanks so much. Iā€™ll ask if anything pops up.

Are you planning to post a copy on JGO? Iā€™m really curious! Showcase it or put on WIP if it is not done yet.

Also, I want to have a page at my home site that lists games that use AudioCue. Please let me know if yours is ever being made available (commercial or free, either way).

Thanks for giving it a try and posting the positive feedback!
8)

I will message you a copy! I donā€™t plan to put on JGO until I actually have gameplay, right now, itā€™s really simply framework. I would again highly recommend this to anybody who wants a better alternative to TinySound. The only flaw is the limit on the number of sounds of one type you can have playing at once. It would be perfect if you could make this dynamic in some way, i.e. having an option to allow an AudioCue to automatically allocate more notes if a sound is used many times. However I am not sure of the possible consequences of this, and if it means that memory usage would spike, I suppose itā€™s not a good idea. Thanks again for the best solution to a problem that I was not looking forward to dealing with (Iā€™m not too good at audio)!

I will ponder this.

There is hardly any cost at all to having a high polyphony when few instances are running. (Only cursor/pointer objects are generated per instanceā€“and they are very small, and only require processing when they are actually running.)

So for most situations, it should be okay to configure a highest-use-case scenario and use that throughout. The main thing is to figure out how big the buffer has to be to play the maximum polyphony and not have dropouts, and then see if the latency that goes with that buffer is acceptable for the lower use cases. If that works, you should be good to go.

One way to code this to be more dynamic would be to allow change requests to the buffer size or the polyphony to be processed at the start of the outer loop of the AudioCuePlayer.run method. When done this way, the granularity of changes would be limited to once per buffer iteration.

If this isnā€™t mixing down to a single line, itā€™s not a better alternative to TinySound! @philfrei how easy in your architecture would it be for you support that?

I agree that ā€˜betterā€™ is not exactly accurate. There are tradeoffs. AudioCue reduces the number of output lines required (compared to making a unique Clip for each concurrent instance), but there are still many output lines being run at once.

Actually, it would be fairly straightforward to make a static master loop which iterates through all AudioCueā€™s and their individual instances. As I mentioned in an earlier post, handling requests to add/remove additional AudioCueā€™s could occur at the head of the master mixer run methodā€™s outer loop.

One aspect which I do NOT know the answer to: how are multiple output lines being handled by the Java implementation that merges SourceDataLines? It could be that what is underlying is already quite efficient. If that is the case, implementing this would be redundant except for situations where there are a limited number of lines allowed, for example, with Raspberry PI (I think FabulousFellini mentioned running across a limit of 8 audio lines in that case.)

I should clarify that better from my unaudioeducated perspective means more functional at a development level. I donā€™t really know the fine points of these tradeofffs :stuck_out_tongue:

EDIT - I looked closely at TinySound. I understand the difference now - in TinySound, it has 1 separate thread which merges data from all audio sources into one and sends it out a single line, and in yours, you have multiple threads and multiple lines. Do the negative effects of this include significant impacts on performance?

Itā€™s usually not managed by the java implementation at all, itā€™s managed at the OS level. Thatā€™s why it gets flaky across different systems, or even different drivers. Itā€™s also why TinySound got written in the first place. Merging your feature set with the TinySound mixing could be a really useful little library. Everything else out there (including Pipes v2 when I finally extricate it from Praxis LIVE) is a lot bigger and less liberally licensed.

Having a master single output at the static level might eliminate the need for some dynamism for polyphony or buffer size at the cue level (one can add/remove additional cues to achieve this rather than change the buffer or polyphony of individual AudioCues), but having a master also reduces some flexibility in that everything will have to use the same buffer size. Iā€™ll have to do some thinking and testing. Unclear if I can make this work if different cues use different buffers (or if even having different buffers is relevant to the situation).

The major OS implementations, I think, are able to handle multiple outputs pretty efficiently. There are some Linux systems, and Raspberry PI that have limits that cause problems. One can manage a large bank of AudioCues and just be sure that a maximum of 8 are open at once (for Raspberry) via the open and close methods. For a system that only has one output line allowed (not many of them around any more, AFAIK), something like TinySound does become mandatory.

Hey, so any updates on switching to a single line? Donā€™t mean to nag :D, it would be handy though :slight_smile: and if not no worries at all itā€™s already the best choice out there for its class.

By all means, nag! I am far too easily discouraged or distracted. If people ask for things I am much more encouraged and likely to follow through. Character weakness.

Consider it asked :stuck_out_tongue:

Progress report:

Added two changes to AudioCue

  • has interface that allows it to be a ā€œtrackā€ for an AudioCueMixer
  • added an alternate ā€œOpenā€ method where an AudioCueMixer is an argument.

Wanted to keep changes to API down. Above seems acceptable. Did some refactoring in the process to minimize code duplication between playing the AudioCue in its own thread vs. being part of mixer.

Two new objects that are part of the project:

  • AudioCueMixer (class)
  • AudioCueMixerTrack (interface)

Just finished round of debugging and ran a test on two AudioCueā€™s being played via the mixer at the same time, and it worked fine. The cpu on the Windows Task Manager is staying below 0.5%, as it should.

Still some more work to do before publishing. I have to expand the API (write doc-comments) and other things like set up demo code and provide instructions.

I am second-guessing the name AudioCueMixer. The only thing that this class does, if you get down to it, is funnel all the audio to a single SourceDataLine output. There are no other standard mixing capabilities if you are looking at this as analogous to a DAW. All that has to be done via the constituent cue classes, as before. Also, I worry about confusion with the javax Mixer class (badly named imho).

It is possible to instantiate the AudioCueMixer with a buffer size and thread priority. The buffer size will be used to set the SourceDataLine buffer size, as well as impose this buffer size on the member AudioCueTracks. When no default is specified, I put in what I think is a rational default to override the really large default in the Java API. It seems more reasonable to make all the tracks use the same buffer size rather than trying to manage individual tracks having their own buffer sizes. If someone wants to have a different buffer for a given cue, they can run it independently or in a second AudioCueMixer.

AFAIK: while merging audio lines is helpful for situations with a limited number of outputs, Iā€™m not at all clear that the doing merging at the Java level is any better than relying on the jre or native code provided that implements javax.sound.sampled.

Am also considering putting the project on GitHub. I kind of liked the idea that the code base was small enough that it could be easily loaded from a couple .java files, and GitHub tends to suggest more elaborate projects. With the AudioCueMixer added, we are now at 5 files (not counting demo & test code files). GitHub now has tools available which make it much easier to use than before, so that is an argument in its favor.

Blah blah keeping up my word count per post. This isnā€™t exactly something I get to chat about with local friends and family exactly, nor with any ā€˜employerā€™, so hopefully some slack will be cut.

It mixes the audio together -what else are you going to call it? :wink: Itā€™s the name that should make most sense to someone coming to your codebase from scratch. Unlike javax Mixer, which is badly named given most of the mixers donā€™t actually mix anything.

It is massively better! For a start, the JRE does not do mixing. So, the mixing will be happening at the OS and/or driver level, which means there are many permutations to test! They all have a limited number of outputs too, just ā€œlimitedā€ is bigger in some cases. While some of the native mixing may end up being similarly performing while playing, setting up and tearing down lines will add much more overhead than handling mixing yourself. Oh, and donā€™t assume that your lines will play nicely in sync - controlling timing is much easier in one place in your own code.

TL;DR - now youā€™ve made it, use your own mixing! ;D

Yay! I was gone for some time so I didnā€™t notice this but now Iā€™m back and Iā€™ll test it out!

The code has been posted at GitHub.

I was trying to do more before posting, but that could take forever. There is enough here now for someone willing to jump in and try the code out.

ā€œMoreā€ would be things like tutorials, examples, as well as clear paths to finding same. Maybe also some sort of ā€œdonateā€ button. (Finally set up PayPal account, am waiting for them to verify my bank links and still have to figure out how to get a button and post it.)

Also, as a GitHub project, I need to figure out how to manage the project, e.g., to do lists and how to work with people wanting to contribute or fork the project (if any are interested ā€“ also, how do they contact me?).

But at least there is a start so folks should be able to use the mix-down capabilities to a single SourceDataLine now.

Iā€™d suggest you to implement Gradle and take a look at these links: