AudioCue, starting point for concurrent Clip

Jar file, with source and three audio samples can be downloaded from audiocue.jar.

The goal of the class AudioCue is to make a control that is relatively easy to use and has capabilities not available with the Java Clip (javax.audio.sampled.Clip).

This is just a starting point. There is a LOT that you can do with a Clip that I haven’t implemented yet. The only thing I put in was the ability to set the volume of the play(), and to allow AudioCue to play concurrently. But the framework is there for additional features to be added on, such as looping, or playing back at various speeds, or altering the volume during playback. I’ll try to find the time to add these capabilities in the next few weeks, but probably won’t be able to do much until at least June.

My strategy for concurrent play back is to have all instances of the cue mixed together and output via a single javax.audio.sampled.SourceDataLine. It is possible to rig up multiple Java Clips of the same cue, to allow the sound to be played over itself, but each Clip opens up a new line for audio output. I thought it a good idea to keep all the occurrences grouped in the same output line. NOTE: if you are using audio files that were recorded at loudest possible volume, you will have to back off on the volume for concurrent playback to avoid distortion.

Some code reading/review would be great! I’m open to others tinkering and proposing changes or using the code and making their own changes.

The code has some diagnostics in it that can be observed if run from the console. You’ll probably want to comment them out if you use this code yourself.

The “test” class file will first play a normal Java Clip, then three overlapping plays of the AudioCue. The pauses are on the long side, to accommodate the longest of the three audio samples that I included. The included samples are a gunshot, a bell, and a frog croak. You can use comments on the code to select which sound you wish to test.

This first file is the code for testing the class.

package jgo;

import java.io.IOException;
import java.net.URL;

import javax.sound.sampled.AudioInputStream;
import javax.sound.sampled.AudioSystem;
import javax.sound.sampled.Clip;
import javax.sound.sampled.DataLine;
import javax.sound.sampled.LineUnavailableException;
import javax.sound.sampled.UnsupportedAudioFileException;

public class TestAudioCue {

	public static void main(String[] args) throws LineUnavailableException, 
		UnsupportedAudioFileException, IOException, InterruptedException 
	{
		System.out.println("TestAudioCue() start");
		TestAudioCue test = new TestAudioCue();
		URL url;
		url = test.getClass().getResource("GunshotIndoor3.wav");
//		url = test.getClass().getResource("a3.wav");
//		url = test.getClass().getResource("frog.wav");
		
		System.out.println("Test Normal Java Clip");
	    AudioInputStream ais = AudioSystem.getAudioInputStream(url);
	    DataLine.Info info = new DataLine.Info(Clip.class, ais.getFormat());
	    Clip clip = (Clip) AudioSystem.getLine(info);
	    clip.open(ais);
	    
		Thread.sleep(100);
	    // The "play" of the normal Java Clip
		clip.start();
	    Thread.sleep(8000);
	    clip.close();
	    
	    System.out.println("Test new AudioCue");
		AudioCue cue = new AudioCue(url);
		cue.open();
		Thread.sleep(100);
			
		cue.play(0.8f);  // range of volume is a float, 0 to 1
		Thread.sleep(750);
		cue.play(0.8f);
		Thread.sleep(250);
		cue.play(0.8f);
		
		Thread.sleep(8000);
		cue.close();

		System.out.println("TestAudioCue() done");
	}

}

The next file is the AudioCue class. It contains two inner classes. AudioCuePlayer runs in its own thread. [The key “Audio Thread” classes are marked as such with comments. It is important to make sure nothing that blocks playback is added to this code.] When you open() or close() the AudioCue, that will handle setting up or destroying the SourceDataLine and thread that is used for playback. It is good to open() the cue a little bit in time prior to when you first play it. When the cue is open but not actively being played, it sends out zeros – i.e., silence.

The other inner class is AudioCueCursor. This is a cursor/pointer into the sound data array. Unlike Clip, we have access to the sound data, which opens all sorts of possibilities. The AudioCueCursor currently just iterates through the array, and destroys itself when it is done. However, instead of destroying itself, we could add a flag to have it loop, or we could have it progress through the data by an increment other than 1 (and use linear interpolation to arrive at an output value). I have used both of these techniques successfully in my audio library.

package jgo;

import java.io.IOException;
import java.net.URL;
import java.util.concurrent.CopyOnWriteArrayList;

import javax.sound.sampled.AudioFormat;
import javax.sound.sampled.AudioInputStream;
import javax.sound.sampled.AudioSystem;
import javax.sound.sampled.DataLine;
import javax.sound.sampled.Mixer;
import javax.sound.sampled.SourceDataLine;
import javax.sound.sampled.UnsupportedAudioFileException;
import javax.sound.sampled.Line.Info;
import javax.sound.sampled.LineUnavailableException;

public class AudioCue 
{
	private final float[] cue;
	private final CopyOnWriteArrayList<AudioCueCursor> cursors;
	
	private final int bufferSize = 64;
	private final int outBufferSize = bufferSize * 2;
	private final float[] buffer = new float[bufferSize];
	
	private final AudioFormat audioFormat;
	private final Info info;
	
	private Mixer mixer;
	public void setMixer(Mixer mixer)
	{
		this.mixer = mixer;
	}
	
	public volatile boolean playerRunning;
	
	public AudioCue(URL url) throws UnsupportedAudioFileException, 
		IOException, LineUnavailableException
	{
		cue = loadURL(url);
		cursors = new CopyOnWriteArrayList<AudioCueCursor>();
		
		audioFormat = new AudioFormat(
						AudioFormat.Encoding.PCM_SIGNED, 
						44100, 16, 2, 4, 44100, false);
		
		info = new DataLine.Info(SourceDataLine.class, 
				audioFormat);		
	}
	
	private float[] loadURL(URL url) throws UnsupportedAudioFileException,
	IOException
	{
		AudioInputStream ais = AudioSystem.getAudioInputStream(url);

		int framesCount = 0;
		if (ais.getFrameLength() > Integer.MAX_VALUE >> 1)
		{
			System.out.println(
					"WARNING: Clip is too large to entirely fit!");
			framesCount = Integer.MAX_VALUE >> 1;
		}
		else
		{
			framesCount = (int)ais.getFrameLength();
		}
		
		// stereo output, so two entries per frame
		float[] temp = new float[framesCount * 2];
		long tempCountdown = temp.length;

		int bytesRead = 0;
		int bufferIdx;
		int clipIdx = 0;
		byte[] buffer = new byte[1024];
		while((bytesRead = ais.read(buffer, 0, 1024)) != -1)
		{
			bufferIdx = 0;
			for (int i = 0, n = (bytesRead >> 1); i < n; i ++)
			{
				if ( tempCountdown-- >= 0)
				{
					temp[clipIdx++] = 
							( buffer[bufferIdx++] & 0xff )
							| ( buffer[bufferIdx++] << 8 ) ;
				}
			}
		}
		
		// TODO QUESTION: is it better to do following in above loop
		// rather than iterating twice?
		for (int i = 0; i < temp.length; i++)
		{
			temp[i] = temp[i] / 32767f;
		}
		
		return temp;
	}	

	public void open()
	{
		AudioCuePlayer player = new AudioCuePlayer();
		Thread t = new Thread(player);
		
		t.setPriority(Thread.MAX_PRIORITY);     
		t.start();
		playerRunning = true;
	}

	public void close()
	{
		playerRunning = false;
	}
	
	public void play(float volume)
	{
		System.out.print("AudioCue.play called: " + System.currentTimeMillis());
	
		AudioCueCursor cursor = new AudioCueCursor();
		cursors.add(cursor);
		cursor.play(volume);
		System.out.println(", AudioCueCursor.play called:" 
				+ System.currentTimeMillis());
		System.out.println("(play) playing Cursors:" + cursors.size());
	}	
	
	private class AudioCueCursor
	{
		volatile boolean  isPlaying;
		int idx = 0;  
		float volume = 0;
		
		// TODO: make idx a float if/when we implement 
		// variable speed play back, and add LERP
		void play(float volume)
		{
			idx = 0;
			this.volume = volume;
			isPlaying = true;
		}
		
		// Audio Thread Code
		public void read(float[] buffer)
		{
			if (isPlaying)
			{
				float audioData = 0;
				for (int i = 0, n = buffer.length; i < n; i++)
				{
					audioData = idx < cue.length ? cue[idx] : 0 ;
					buffer[i] += volume * audioData;
					idx++;
				}
			}
			isPlaying = idx < cue.length;
			if (!isPlaying)
			{
				cursors.remove(this);
				System.out.println("(read/remove) playing Cursors:" + cursors.size());
			}
		}
	}
	
	private class AudioCuePlayer implements Runnable
	{
		SourceDataLine sdl;
		
		AudioCuePlayer()
		{
			System.out.println("Opening SDL");
			try {
				sdl = getSDL();
				sdl.open();
				sdl.start();
			} 
			catch (LineUnavailableException e) 
			{
				e.printStackTrace();
			}
		}
		
		// Audio Thread Code
		public void run()
		{
			while(playerRunning)
			{
				// Start with 0-filled buffer, send out silence
				// if nothing playing.
				for (int i = 0; i < bufferSize; i++) buffer[i] = 0;
	
				for (AudioCueCursor acc:cursors)
				{
					if (acc.isPlaying) 
					{
						// sum data into buffer
						acc.read(buffer);
					}
				}
				
				sdl.write(convertToAudioBytes(buffer), 0, outBufferSize);
			}
		
			System.out.println("closing sdl");
			sdl.drain();
			sdl.close();
			sdl = null;
		}
	}

	// Set up Java's output line
	private SourceDataLine getSDL() throws LineUnavailableException
	{
		if (mixer == null)
		{
			// default = Java's selection
			return (SourceDataLine)AudioSystem.getLine(info);
		}
		else
		{
			// an actively chosen Line
			return (SourceDataLine) mixer.getLine(info);
		}
	}

	// Audio Thread Code
	private byte[] convertToAudioBytes(float[] buffer)
	{
		byte[] audioBytes = new byte[outBufferSize];
		
		for (int i = 0; i < bufferSize; i++)
		{
			buffer[i] *= 32767;
			
			audioBytes[i*2] = (byte) buffer[i];
			audioBytes[i*2 + 1] = (byte)((int)buffer[i] >> 8 );
		}
	
		return audioBytes;
	}
}


Some more notes (I have a reputation for wordiness to maintain), and things to add:

Audio Data form:
I chose to go ahead and convert the audio bytes to normalized float data (ranging from -1 to 1). We could have converted the bytes to shorts and done our audio mixing in that format. Maybe people will prefer that form in order to save a multiplication and division per frame. I think for many algorithms (not implemented yet) it is just easier to work with normalized floats.

Volume changes during playback:
Well for short cues, you can just include the desired volumes with the play() command and taper or grow as you please with the successive plays. For longer cues, the question is whether to add a volume control that affects all concurrently playing cues (a “master volume”) or for just a single playback instance. For this, my plan would be to create a hook in the AudioCueCursor, and make the AudioCueCursor a return value of the play() method. Changing volumes abruptly can create discontinuities in the sound data that results in clicks. My strategy has been to take commands to alter volume and incorporate them over the course of 1028 frames, in linear increments. This brings up another consideration: the volume parameter does not have a linear effect. You will probably want to put in something like a X^3 function if you want a semblance of, for example, volume 0.25 being one half as loud as 0.5.

Audio playback buffer:
My choice of a buffer size is kind of arbitrary. Others probably have a better grasp of what values would allow the best throughput:safest trade offs. Please edit and let me know what works best for you. To be honest, the audio library I made for myself has, basically, a buffer of 1 (i.e., no buffer, just goes forward one frame at a time). This has worked out so far and allowed me to write a functional event system with per frame accuracy. But since the goal here is just to supplement the Java audio controls of Clip and SourceDataLine, it seemed logical to go for a greater throughput afforded by using a buffer. [This is part of the reason it took so long for me to write this–having to convert to using a buffer which none of my code uses.]

Should I have stuck with start() instead of play()? It seems to me that play(), with parameters like volume, panning (to do), the addition of a AudioCueListener (also, to do), is the clearer command name.

We probably should have a mono cue version that can be stereo panned. Panning stereo cues doesn’t make a whole lot of sense to me. The panning algorithm could either use volume or using delay.

I like doing more than just simple looping. I like having options to overlap the edges, and have done this in my own audio library. Cues with overlaps can have a nice smooth continuity where some loops have clicks over the breaks. It is mostly a matter of setting the overlap size (in frames) and deciding on the blending function (linear, or just add together, or various curves sometimes work best). I haven’t done the following yet on my own library, but I think it would be neat to have a smoothly looping sound that is subject to real time speed and volume changes.

Once one can do overlapping of edges, it is a short distance to extracting ‘macro-granules’ from the sound data body and stringing them together. This strategy can be used to create endless-non-repeating sounds such as campfires or brooks.

TIMING:
Okay, the timing of the playbacks is going to be subject to some variance. This is kind of inevitable, as Java does not offer real time guarantees. In particular, the first play may lag by a millisecond or two, relative to the following plays, as we go from compiled to hot. This can be observed in the diagnostic comments I put in the code. Also, if you do a string of plays, all at exactly 100 millis, for example, chances are there will be some wobble to the timing. Another contributor to this is that we can’t predict when Java switches between threads or when it chooses to garbage collect. Java DOES do a good time of maintaining playback steadiness of existing sound–and CAN provide very solid timing, but doing so requires tapping into the audio thread and counting elapsed frames. At least, that is what I do with the event system I wrote. Even so, the timing is pretty good. The AudioClip should be on a par with the Java Clip.