AudioMixer

I have written a new class for gathering several audio tracks into a single SourceDataLine. I am trying to write something that is light, fast and simple enough to be used in Java Games. The coding is pure Java.

I wouldn’t call it ready for prime time yet. But I thought it was far enough along to display and get feedback, especially given the considerable talent and experience here at JGO. I’ve successfully played 4 .wavs and 2 fm-generated “RayGuns” = 6 tracks total. I don’t know what its true capacity is yet, in terms of performance, especially given that optimization is not my forte.

The two key elements are the AudioMixer class, and the Abstract MixerTrack interface. The AudioMixer class contains and operates on an array of MixerTrack sub-interfaces. [EDIT: At this point (11/8/11) I have started supporting two sub-interfaces: “ContinuousMixerTrack” and “TriggeredMixerTrack”. Continuous uses start() and stop(), Triggered uses play().]

A MixerTrack interface can be used to implement a wrapper for any audio format, as long as the wrapper creates two float arrays, one for stereo left and one for stereo right. The data values created by the wrapper should range from -32767f to 32767f. I may decide to change this range to -1.0f to 1.0f in the future, if integration with the “outside world” merits.

Here is the MixerTrack interface code. [EDIT: updated 11/8/11]

import java.io.IOException;

abstract public interface MixerTrack {

	boolean isRunning();

	void read() throws IOException;
	
	float[] audioValsL();
	float[] audioValsR();
	
	void setVolume(float f);
	float getVolume();
}

The sub-interface for continuously playing tracks follows. I was considering a file-read data source “continuous” because one doesn’t normally retrigger such files. And if you do attempt to do so, it usually involves opening an entirely new AudioInputStream. Also, I didn’t want to encumber such files with the various triggered file methods, and vice versa.


public interface ContinuousMixerTrack extends MixerTrack {

	void start();
	void stop();
}

The easiest way to explain it is maybe to show an example of its use. Following is a wrapper for a .wav file. [EDIT: updated 11/11/11]

import java.io.IOException;

import javax.sound.sampled.AudioInputStream;
import javax.sound.sampled.AudioSystem;
import javax.sound.sampled.UnsupportedAudioFileException;

public class WavTrack implements ContinuousMixerTrack 
{
	// use volatile to enforce execution order
	private volatile boolean running;
	private float[] audioValsL;
	private float[] audioValsR;
	
	private int sampleBufferSize;
	private final int bytesToRead;
	private byte[] buffer;
	
	private String fileName;
	
	
	private AudioInputStream ais;
	private volatile float volume;
	
	@Override
	public boolean isRunning() {return running;}

	@Override
	public float[] audioValsL() {return audioValsL;}

	@Override
	public float[] audioValsR() {return audioValsR;}

	@Override
	public float getVolume() {return volume;}
	@Override
	public void setVolume(float volume)
	{
		this.volume = volume;
	}
	
	
	// CONSTRUCTOR
	public WavTrack(AudioMixer audioMixer,
			String fileName) 
		throws UnsupportedAudioFileException, IOException
	{
		this.fileName = fileName;
		URL url = AudioMixer.class.getResource(fileName);		
		ais = AudioSystem.getAudioInputStream(url);
		
		sampleBufferSize = audioMixer.getSampleBufferSize();
		audioValsL = new float[sampleBufferSize];
		audioValsR = new float[sampleBufferSize];
		
		bytesToRead = sampleBufferSize * 4;
		buffer = new byte[bytesToRead];

		audioMixer.addTrack(this);
		
		// default settings
		running = false;
		
	}
	
	@Override
	public void start() 
	{
		System.out.println("start wav track: " + fileName);
		running = true;	
	}
	
	
	@Override
	public void stop() 
	{
		System.out.println("stop wav track: " + fileName);		
		running = false;
	}

	@Override
	public void read() throws IOException 
	{
		if (running)
		{
			int bytesRead = ais.read(buffer, 0, bytesToRead);
			
			int j = 0;
			int completeSamplesRead = bytesRead / 4;
			for (int i = 0; i < completeSamplesRead; i++)
			{
				audioValsR[i] = (float)
					(( buffer[j++] & 0xff )
					| ( buffer[j++] << 8 ));
				audioValsL[i] = (float)
					(( buffer[j++] & 0xff )
					| ( buffer[j++] << 8 ));
				
				audioValsR[i] *= volume;
				audioValsL[i] *= volume;
			}
			
			// the rest can/should be filled in with zeros
			for (int i = Math.max(0, completeSamplesRead);
					i < sampleBufferSize; i++)
			{
				audioValsR[i] = 0;
				audioValsL[i] = 0;
			}	
		}
	}
}

I think the Constructor should be straight-forward for anyone who has ever coded the playback of a .wav via the javax.audio.sampled library, using a relative file location to open an AudioInputStream.

The most important function of the wrapper is the Read method. Each Read iteration converts the .wav data to the float values that the AudioMixer expects. Because the AudioMixer is continually asking for and expecting a full buffer’s worth of data, I have the WavTrack fill in the float arrays with 0’s if there is none or not enough sound data returned from the read().

TODO: put in a quick, dynamic ramp-up from 0 to the volume upon start() and from the volume down to 0 on the stop(). This will help ensure there will be no clicks when starting and stopping.

I’ve also written a “RayGun” using this interface. It is a silly, simple FM toy, that has multiple cursors reading data from a WaveTable. I mention this because a MixerTrack doesn’t have to read data from disk. It can generate the data on the fly. Or one can be written to function similarly to a Clip, storing the data in whatever form makes most sense. I’m envisioning a “ClipShooter” with a WaveTable for the internal representation of audio, and have this wrapper support multiple cursors so that multiple triggerings can play the sound concurrently. [Done: 11/3/11, see post #3]

In the next post, I’ll display the AudioMixer itself.

Here’s what I have so far, for the AudioMixer. Below this is a test program. Again, feedback and suggestions are much appreciated! [Edit: code updated 11/8/11]

import java.io.IOException;
import java.util.concurrent.CopyOnWriteArrayList;

import javax.sound.sampled.AudioFormat;
import javax.sound.sampled.AudioSystem;
import javax.sound.sampled.DataLine;
import javax.sound.sampled.LineUnavailableException;
import javax.sound.sampled.Mixer;
import javax.sound.sampled.SourceDataLine;
import javax.sound.sampled.DataLine.Info;
/*
 * This iteration allows an indefinite number of 
 * tracks, and CopyOnWriteArrayList to manage
 * the concurrency of add/remove tracks and
 * iteration through the track list.
 */
public class AudioMixer {
	
	private Mixer mixer;
	private CopyOnWriteArrayList<MixerTrack> tracks;
	private float masterVolume;
	
	private AudioFormat audioFmt;
	private boolean playing;
	private final int bufferSizeInBytes;
	private final int bufferSizeInSamples;
	private volatile boolean clipped;
	
	// hook for GUI, to allow client selection
	public void setMixer(Mixer mixer)
	{
		this.mixer = mixer;
	}
	public int getSampleBufferSize() {return bufferSizeInSamples;}
	public boolean getClipped() {return clipped;}
	
	public void addTrack(MixerTrack track)
	{
		tracks.add(track);
	}
	
	public void removeTrack(MixerTrack track)
	{
		tracks.remove(track);
	}
	
	public void setMasterVolume(double volume)
	{
		masterVolume = (float)Math.max(0, Math.min(1, volume));
	}
	public AudioMixer()
	{
		tracks = new CopyOnWriteArrayList<MixerTrack>();

		// hard-coded specs for now...
		audioFmt = new AudioFormat(
				AudioFormat.Encoding.PCM_SIGNED, 
				44100, 16, 2, 4, 44100, false);
		
		bufferSizeInBytes = 8192; // 2^11 = 0x7ff + 1
		bufferSizeInSamples = bufferSizeInBytes/4;
		
		/*
		 *  Assumption, we will eventually make the buffer 
		 *  sizable, in which case screening of size is 
		 *  helpful.
		 */
		if (bufferSizeInBytes % 4 != 0)
		{
			System.out.println("buffer size must be divisible by 4");
		}
		
		masterVolume = 0.75f;
	}
	
	public void play(Mixer mixer)
	{
		// refers to javax.sound.sampled 'Mixer'
		this.mixer = mixer;

		MixerPlayer mixerPlayer = new MixerPlayer();
		Thread t = new Thread(mixerPlayer);
		playing = true;
		t.start();
	}
	
	public void stop()
	{
		playing = false;
	}
	
	class MixerPlayer implements Runnable
	{
		public void run()
		{
			// "worker" arrays, temporary holding areas
			float[] audioValsR = new float[bufferSizeInSamples];
			float[] audioValsL = new float[bufferSizeInSamples];					
			
			try 
			{
				/*
				 * OUTPUT line initialization.
				 */
				SourceDataLine sdl;
				Info info = new DataLine.Info(SourceDataLine.class, 
						audioFmt);
				
				// Set up Java's output line
				if (mixer == null)
				{
					// default = Java's selection
					sdl = (SourceDataLine)AudioSystem.getLine(info);
				}
				else
				{
					// an actively chosen SDL
					sdl = (SourceDataLine) mixer.getLine(info);
				}
				sdl.open(audioFmt, bufferSizeInBytes);
				sdl.start();
				
				byte[] buffer = new byte[bufferSizeInBytes];
				
				while(playing)
				{
					float[] audioSumR = new float[bufferSizeInSamples];
					float[] audioSumL = new float[bufferSizeInSamples];
					
					for (MixerTrack mt : tracks)
					{
						if (mt.isRunning())
						{
							/*
							 *  Individual track volumes to be
							 *  handled internally by the track.
							 */
							try {
								mt.read();
							} catch (IOException e) {
								e.printStackTrace();
							}
										
							audioValsR = mt.audioValsR();
							audioValsL = mt.audioValsL();
							for (int i = 0; i < bufferSizeInSamples; i++)
							{
								audioSumR[i] += audioValsR[i];
								audioSumL[i] += audioValsL[i];
							}
						}
					}
					
					int j = 0;
					clipped = false;
					/*
					 * Per-sample ordering of work allows hook
					 * for DSP.
					 */
					for (int i= 0; i < bufferSizeInSamples; i++)
					{
						// implement RTESmoother? echo/reverb?
						audioSumR[i] *= masterVolume; 
						audioSumL[i] *= masterVolume;
						// clip screening
						if (audioSumR[i] > 32767)
						{
							clipped = true;
							audioSumR[i] = 32767;
						} else if (audioSumR[i] < -32767)
						{
							clipped = true;
							audioSumR[i] = -32767;
						} else 
						{
							clipped = false;
						}
						
						if (audioSumL[i] > 32767)
						{
							clipped = true;
							audioSumL[i] = 32767;
						} else if (audioSumL[i] < -32767)
						{
							clipped = true;
							audioSumL[i] = -32767;
						}
							
						buffer[j++] = (byte) audioSumR[i];
						buffer[j++] = (byte)((int)audioSumR[i] >> 8 );
						buffer[j++] = (byte) audioSumL[i];
						buffer[j++] = (byte)((int)audioSumL[i] >> 8 );
						
					}

					sdl.write(buffer, 0, bufferSizeInBytes);
				} // end while
			}
			 
			catch (LineUnavailableException e) // sdl open
			{
				System.out.println("LineUnavailableException");
				e.printStackTrace();
			} 
			finally
			{	
				tracks.clear();
			}
		}
	}
}

The constructor no longer requires any parameters. takes a count of the number of tracks one wishes to have. It then fills the array with NoPlayTrack instances. This is a “dummy” MixerTrack class implementation that never plays and always allows itself to be replaced.

The constructor sets some invariants that maybe could/should be parameters. In particular, one might prefer a different choice of output. For now, I’m sending to a very common line (stereo, 16-bit, signed PCM, little endian, 44100 fps). But it shouldn’t be hard to set up to select a different one. Also, I did not yet put in much in the way of warnings if such a line is not available on a client. [TODO!]

Along those lines, a client might want to tweak the buffer size, or select a particular Mixer (as defined by javax.sound.sampled) if their OS offers a choice. I’ve made no provision for buffer size changes, yet. But it is not hard to give the client a MenuBar option that lets them pick a Mixer. There is a hook provided, in anticipation of this being implemented in the GUI.

The addTrack(MixerTrack mixerTrack) method is what one calls to include a new audio source in the mix. setTrack() is what one calls to place a MixerTrack in the MixerTrack[] array: “tracks”. It is important that an existing MixerTrack which is “playing” is not blown away all of a sudden, as the abrupt loss of audio will cause clicks. Thus, we check first to make sure the MixerTrack is in a silent state and is “replaceable”.

TODO: make We now have hooks for masterVolume adjustment.

The play() method fires up a separate thread, the inner runnable MixerPlayer class, as is necessary for audio playback. The close() method sets a boolean that allows this thread to expire.

TODO: put in code to ramp up the volume from 0 to the masterVolume level upon play() and back down to 0 upon close() to prevent clicks.

The “meat” of the program is in the MixerPlayer run() method. Once a SourceDataLine for output is opened, we enter the main loop. Here, the program makes two float arrays (audioSumR & audioSumL) for collecting (summing) the audio track data. QUESTION: is better to initialize existing arrays by looping through and putting 0’s in them, or to just make new ones as done here? If a track is “playing”, then the audio data is collected and summed into the arrays.

Once collected, we test for overflow and set a “clipping” boolean. I have NOT tested the logic of the “clipped” boolean yet. TODO: make a hook to allow a GUI to read or receive this boolean’s state. QUESTION: is there a more efficient way to do these tests?

Once the data is clipped down to size (if necessary), we convert it to the bytes needed for the SourceDataLine, and ship it off.

Below is a program for testing purposes, which also demonstrates the use of these classes. I load my audio files as “resources” from a sub folder called “/audio”. You will have to substitute your own sound files. Or, you can get copies of the same exact three bells that I used online at http://hexara.com/Audio/a3.wav, http://hexara.com/Audio/a4.wav, http://hexara.com/Audio/a6.wav.

[EDIT: updated 11/8/11]


import java.io.IOException;

import javax.sound.sampled.UnsupportedAudioFileException;

import com.adonax.pfaudio.AudioMixer;
import com.adonax.pfaudio.ContinuousMixerTrack;
import com.adonax.pfaudio.WavTrack;

public class WavTrackTests {
	AudioMixer am;
	double testVolume = 0.5;

	public static void main(String[] args) {
		WavTrackTests t = new WavTrackTests();
		
		try {
			t.testCase1();
			Thread.sleep(500);
			t.testCase2();
			Thread.sleep(500);
			t.testCase3();
		} 
		catch (InterruptedException e) {
			e.printStackTrace();
		} 
		catch (UnsupportedAudioFileException e) {
			e.printStackTrace();
		} 
		catch (IOException e) {
			e.printStackTrace();
		}
	}
	
	// Does a WAV file play?
	private void testCase1() 
		throws InterruptedException, 
		UnsupportedAudioFileException, IOException
	{
		System.out.println();
		System.out.println("Test case 1: play an audio (.wav) file");
		AudioMixer am = new AudioMixer();
		am.play(null); 
		am.setMasterVolume(testVolume);
		
		
		ContinuousMixerTrack bell = new WavTrack(am, 
				"audio/a3.wav");
		bell.setVolume(0.85f);
		bell.start();
		
		Thread.sleep(5000);
		bell.stop();
	
		am.stop();
	}

	// overwrite a Track cleanly and correctly?
	private void testCase2() 
		throws InterruptedException, 
		UnsupportedAudioFileException, IOException
	{
		System.out.println();
		System.out.println("Test case 2: start/stop a track, " +
			" then another track and play that.");
		AudioMixer am = new AudioMixer();
		am.play(null);
		am.setMasterVolume(testVolume);
		
		ContinuousMixerTrack lowBell = new WavTrack(am, 
				"audio/a3.wav");
		
		ContinuousMixerTrack highBell = new WavTrack(am,
				"audio/a6.wav");
		
		
		lowBell.setVolume(0.8f);
		highBell.setVolume(0.8f);
		
		lowBell.start();
		
		Thread.sleep(1000);
		
		Thread.sleep(1000);
		lowBell.stop();
		
		// test pause before starting...
		Thread.sleep(1000);
		highBell.start();
		
		Thread.sleep(4000);
	
		am.stop();
	
	}
	
	// Multiple tracks work?
	private void testCase3() 
		throws InterruptedException, 
		UnsupportedAudioFileException, IOException
	{
		System.out.println();
		System.out.println("Test case 3: multiple tracks in use");
		AudioMixer am = new AudioMixer();
		am.play(null);
		am.setMasterVolume(testVolume);
		
		ContinuousMixerTrack lowBell = new WavTrack(am, 
				"audio/a3.wav");
		
		ContinuousMixerTrack middleBell = new WavTrack(am, 
				"audio/a4.wav");

		ContinuousMixerTrack highBell = new WavTrack(am, 
				"audio/a6.wav");
		
		lowBell.setVolume(0.8f);
		middleBell.setVolume(0.7f);
		highBell.setVolume(0.6f);
		
		lowBell.start();
		Thread.sleep(1000);
		
		middleBell.start();
		Thread.sleep(1000);

		highBell.start();
		Thread.sleep(5000);

		am.stop();	
	}	
}

I’ve added a “ClipTrack”. It is functionally similar to a Java Clip. But I’ve added some features.

  1. It is held in an array of short integers. Theoretically, this is no larger than the internal representation of the Java Clip. But since we have access to the data, we can support multiple cursors and variable rates.
    THUS:
  2. It can be played at varying speeds.
  3. It can optionally be played ‘overlapping’. When retriggered before it has had a chance to finish playing, it starts an additional play while letting the previous plays finish up naturally. I haven’t put an upward limit on the number of plays, as the cost is quite minimal. (a cursor, and two adds per sample).

+++++++++++++++++++++++++

I had to make a new Interface, to support the triggered aspect and the varispeed parameter.

[EDIT: updated 11/8/11]

public interface TriggeredMixerTrack extends MixerTrack {

	void play();
	void play(double speed);
}

To Construct:
TriggeredMixerTrack clip = new ClipTrack(audioMixer, filename, overlapping); // overlapping mode is optional

Examples of use:

clip.play(); // plays the clip data at a normal playback rate

  • or -
    clip.play(2.5) // plays the audio data at (for example) 2.5 times the recorded speed

Following is the code for the ClipTrack. [Edit: updated 11/11/11]

import java.io.IOException;
import java.util.concurrent.CopyOnWriteArrayList;

import javax.sound.sampled.AudioInputStream;
import javax.sound.sampled.AudioSystem;
import javax.sound.sampled.UnsupportedAudioFileException;

public class ClipTrack implements MixerTrack, TriggeredMixerTrack {

	// use volatile to enforce execution order
	private volatile boolean playing;
	
	// for output (result of read)
	private float[] audioValsL;
	private float[] audioValsR;
	private final int sampleBufferSize;
	
	// for input 
	private byte[] buffer;
	private int bufferSize;
	
	private CopyOnWriteArrayList<Cursor> cursors;
	private boolean overlap;
	
	// clip data related
	private final String fileName;
	private final short[] clipR;
	private final short[] clipL;
	private final int clipLastRead;
	
	private volatile float volume;
	
	public String getFileName() { return fileName;}
	
	@Override
	public boolean isRunning() {return playing;}

	@Override
	public float[] audioValsL() {return audioValsL;}

	@Override
	public float[] audioValsR() {return audioValsR;}

	@Override
	public float getVolume() {return volume;}
	@Override
	public void setVolume(float volume)
	{
		this.volume = volume;
	}

	// CONSTRUCTOR
	public ClipTrack (AudioMixer audioMixer,
		String fileName) 
		throws UnsupportedAudioFileException, IOException
	{
		this(audioMixer, fileName, false);
	}
	
	public ClipTrack (AudioMixer audioMixer, 
			String fileName, boolean overlap) 
		throws UnsupportedAudioFileException, IOException
	{
		this.fileName = fileName;
		this.overlap = overlap;
		// These initializations for output
		sampleBufferSize = 
			audioMixer.getSampleBufferSize();
		audioValsR = new float[sampleBufferSize];
		audioValsL = new float[sampleBufferSize];
		cursors = new CopyOnWriteArrayList<Cursor>();
		
		// LOAD the Clip Data.
		bufferSize = 1024;
		buffer = new byte[bufferSize];

		URL url = AudioMixer.class.getResource(fileName);		
		AudioInputStream ais = AudioSystem.getAudioInputStream(url);

		long clipLength = ais.getFrameLength();
		System.out.println("clip len: " + clipLength);
		
		
		// TODO: make overflow exception, test before cast.
		if (clipLength > Integer.MAX_VALUE)
		{
			System.out.println(
				"WARNING: Clip is too large to entirely fit!");
		}
		int intClipLength = (int)clipLength;
		clipR = new short[intClipLength];
		clipL = new short[intClipLength];
		clipLastRead = intClipLength - 2;
		
		int bytesRead = 0;
		int clipIdx = 0; // range: 0 to intClipLength
		int bufferIdx = 0; // range: 0 to bufferSize
		while((bytesRead = ais.read(buffer, 0, bufferSize)) != -1)
		{
			bufferIdx = 0;
			for (int i = 0; i < bytesRead; i += 4)
			{
				clipR[clipIdx] = (short)
					(( buffer[bufferIdx++] & 0xff )
					| ( buffer[bufferIdx++] << 8 ));
				clipL[clipIdx++] = (short)
					(( buffer[bufferIdx++] & 0xff )
					| ( buffer[bufferIdx++] << 8 ));
			}
		}
		// Finished loading Clip data.

		audioMixer.addTrack(this);
		
		// default settings
		playing = false;
		volume = 0.75f;
		
	}

	public void play()
	{
		play(1);
	}
	
	public void play(double speed)
	{
		System.out.print("play called");
		if (!overlap)
		{
			cursors.clear();
		}
		Cursor c = new Cursor(speed);
		cursors.add(c);
		playing = true;
		System.out.println(", cursor count: " + cursors.size());
		
	}
	
	@Override
	public void read() throws IOException 
	{
		if (playing)
		{
			for (int i = 0; i < sampleBufferSize ; i++)
			{
				audioValsR[i] = 0;
				audioValsL[i] = 0;
				
				for (Cursor c : cursors)
				{	
					audioValsR[i] += get(c, clipR);
					audioValsL[i] += get(c, clipL);
					
					// advance cursor, and delete if finished
					c.idx += c.incr;
					if (c.idx > clipLastRead)
					{
						cursors.remove(c);
					}
				}
				audioValsR[i] *= volume;
				audioValsL[i] *= volume;
			}
		}
	}
	
	private double get(Cursor c, short[] clip)
	{
		float val1, val2;
		
		int i = (int) c.idx;
		double frac = c.idx - i;
		
		val1 = clip[i];
		val2 = clip[++i]; 
				
		return (val1 * (1 - frac) + val2 * frac);
	}
	
	class Cursor
	{
		double idx;
		final double incr;
		
		Cursor(double incr)
		{
			idx = 0;
			this.incr = incr;
		}
		
		Cursor()
		{
			idx = 0;
			this.incr = 1.0;
		}
	}
}

Next chance I get, I’ll try to make a web demo. And, maybe I’ll post a version that can be used by folks who aren’t interested in using the AudioMixer to play it. [EDIT: tried to make a “stand alone” version (no AudioMixer involved) but ran into some problems with timing uneveness which probably could be corrected with an endless background Timer. But it is a lower priority, and I am concentrating on other things.]

This looks really cool, especially the way it’s software-only and can do variable speed play-back. Can’t wait to see the demo

Also, what if play(-1) is called, can it play backwards?!
(EDIT: looks like not since it starts at an index of zero)

:stuck_out_tongue:
I am having some concurrency problems with the Cursors in the ClipTrack. Dang.
When I get it fixed, I will repost the code. (Probably Monday…)
I just wrote a version that works without the use of the AudioMixer, but it has the same concurrency problem.

:emo:
[EDIT] Fixed! ;D

Would it really be useful to be able to play a short cue backwards? Maybe so. It wouldn’t make sense for gunshots or explosions, but a lot of boings and slidewhistle type stuff might be good forward or back. But a lot of that can be generated directly rather than using recordings.

I do have forwards and backwards working on a WaveTable that I’ve used for some simple FM synthesis.

Am also pondering separating out the class that holds the data from a class or classes that can play it.

Yeah true, the user can easily enough just reverse the track data.

Hi Phil,

Great job so far! Haven’t had a play yet mind you, but the API and principle seem sound :stuck_out_tongue:

Firstly, I’ll make the same offer as I made in Paul’s thread - if there’s any code in Praxis / JAudioLibs that’s useful for this, and isn’t 3rd party, then let me know - happy to make it available under whatever license suits as long as it’s free for everyone to use around here. My own plan to release the rest of the audio libs from Praxis separately has been held up as the work on getting Praxis to v1 state is requiring some API changes, and I don’t want to release something where the API is fluctuating. These libs offer a lot more features, but also a lot more complexity - a simple API like yours should suit most people’s use around here.

A few comments on what you’ve got here currently - definitely intended as constructive criticism! :slight_smile:

  • Why not use float values normalised between -1:1? The conversion code would be minimal, and it is pretty much the norm. It would allow easier integration with 3rd-party code too.

  • Why the arbitrary limitation on the number of mixer lines? Why not just have a List of playing sounds, an add(mixertrack) method, remove(mixertrack) method, and maybe optional automatic removal of completed sounds? EDIT: actually, why not have each MixerTrack handle adding itself to the mixer on play() and removing itself on manual stop() or when finished? You could get rid of a few methods in your interface then too.

  • The above would need some thought on thread-safety, though I’m not 100% sure from reading how safe all your control methods are. I’d be tempted to put in a method (Swing-like) AudioMixer.invokeLater(Runnable …), post the runnable to a ConcurrentLinkedQueue, and read them out in your while loop. Then you specify that all methods must be invoked in the audio thread - people are used to this if they’ve used Swing. It makes your life easier because you can forget synchronization everywhere else, and it also makes certain things possible - ie. it becomes possible to trigger multiple sounds atomically at the same time.

  • I’m not sure about the Cursor code in ClipTrack. Why not just have a single position field per ClipTrack, and move the actual data into a separate WaveTable? class that can be shared across multiple ClipTrack’s. That way you can easily control all the parameters (volume, etc) on an individual basis, and you remove complexity from the actual audio processing. One of the most annoying things about the JavaSound Clip object is the way it concatenates mixer line and data.

Best wishes, Neil

Hi Neil!

I am going to single-handedly promote you to “most awesome” at this rate. I really appreciate all your replies and the thought and feedback you are giving my code.

I don’t have time to think about all your questions/suggestions at the moment, am neglecting some other chores scheduled for today. But I can quickly reply to a bit.

[quote]Why not used normalized floats…
[/quote]
Are you referring to the summing which occurs within the AudioMixer?

The data from the wav format is in effect a signed short integer, being two bytes pasted together. I’m thinking that converting the short to a float is “good enough”. Normalizing adds a multiply operation per every sample and reduces precision. And once the math in the mixer is done, another multiply is required for denormalization prior to conversion back to bytes. I haven’t dealt with 3rd party code yet. If integration proves that normalization is justified, it will be trivial to refactor.

The Echo unit I wrote “doesn’t care” if the floats are normalized or not. But I could see that using a 3rd party class would require this, and that if there were more than a couple, then it would makes sense to do it up front rather than repeatedly for each unit.

If you are referring to the memory storage of the Clip data, I’m preferring short integers [signed 16-bit PCM] to floats because they are smaller and memory is a concern.

Do you think these are reasonable answers?

I was preferring using a simple array to implement the tracks, and this requires initialization to a set size. But if the “tracks” collection is a HashSet, then it could be unlimited. I just figured out how to handle the synchronization issues for adding and deleting to a HashSet, so maybe this will be a “go”. Basically, it will require locking and making a clone of the MixerTracks collection for each iteration, which has a small cost. I think the existing array solution avoids concurrency errors, but a track add can fail if the track being replaced isn’t “silent”.

Putting the “add” inside the MixerTrack interface makes good sense. I think this implies making the AudioMixer a static class, though. Otherwise, one has to pass a copy of the AudioMixer to the MixerTrack upon instantiation.

Directly related to this is the fact that I am passing the SourceDataLine buffer size (as determined by the AudioMixer) when I make a MixerTrack.

Hmmm…

Very interesting, but too deep for me to answer in this email. I will reply in a few days.

I DO think separating the ClipTrack wavetable from the cursors makes sense.

But I’m not so sure about managing multiple ClipTrack variables, each with its own cursor vs one ClipTrack managing the multiple cursors. If the AudioMixer can gracefully handle purging itself of ClipTrack instances when they expire, then this DOES become feasible. But I suspect the synchronized cloning of iterators might be better off being segmented into subsets. Do you follow that logic?

In other words, we can have lots of adds and deletes into the AudioMixer, which have to make use of a common lock with the clone for the iterator. It might be better to have the Clip instances and their iterator and locking be local to a single ClipTrack.

Sure. The effects of locking are multiplicative! Yes? It is better to keep lock domains as small and isolated as is possible that still maintains concurrency safety.

I could place a volume value in the Cursor to give it an individual volume…

It is definitely an interesting question to ponder carefully. Getting these things worked out is a big part of keeping the api clean and simple and extensible.

[quote]One of the most annoying things about the JavaSound Clip object is the way it concatenates mixer line and data.
[/quote]
Agreed. Especially if the OS you are on requires funneling everything into a single JavaSound Mixer Line. Also, every iteration needs to open its own Line. Very inefficient if you are playing multiple iterations.

[quote]Praxis / JAudioLibs
[/quote]
Thanks! That is very generous. At some point I have to take a closer look at what is there. Also, at some point I have to figure out the implications of licensing.

DIY has two benefits: one learns by doing, no licensing involved.

I am running so late…dang…much prefer working on this project!

Just happy to be of some help.

Me too! Quick response. :slight_smile:

Are you referring to the summing which occurs within the AudioMixer?
[/quote]
I think I’m confusing you with the ‘normalized’ word again! :wink: Just mean, having the floats in the range -1:1 instead of -32767:32767.

Some algorithms are a bit easier and/or more accurate to work with in the range -1:1, but it may not be a huge issue for you. However, there is an implicit assumption that all your audio is 16-bit.

btw - there is no difference in precision between -1:1 and -32767:32767 - the multiply and divide may add a little (inaudible) imprecision. However, I have a feeling you’ll get more imprecision from not having the decimal point in the same place doing multiplication, such as in a gain (1.0 * 1.0 vs 1.0 * 32767.0).

I wasn’t. Personally, I tend to store in floating point format, but you could also convert on the fly. With the size of memory today, I don’t find it a problem unless you’re using really large sound clips, and then you only have to do the conversion code once (and can do it in a background thread as you load your audio).

See my reply in your other ‘HashSet’ thread. You don’t necessarily need a Set for this, and the suggestion for copying arrays could work here too.

Well, unless you’re targeting multiple soundcards, making AudioMixer a singleton with some static utility methods is probably the easiest way to do this.

If all these operations are moved into the audio thread, then no more problem! ;D

It’s worth having a look at the API of Beads. This has a nice auto-kill mechanism where sample players, etc. can remove themselves from the audio chain. I can’t easily use this mechanism in RAPL (audio routing lib for Praxis) because in a graphical patching environment it doesn’t make much sense, but for an audio API designed for coding, I think it’s really neat!

Best wishes, Neil

Lot’s of things here for me to research and try to understand better. At the risk of “analysis paralysis” I’m going to work through the list of suggestions. Maybe a good place to start is whether to make AudioMixer a Singleton or not. Is it okay to “think out loud” here? (TLDR concerns!)

I was a bit reluctant to make AudioMixer a Singleton due to some very negative reviews Singletons were getting in another thread. There was a lot of discussion on about how these introduce testing difficulties, about introducing “global state” and making code brittle and inflexible. http://www.java-gaming.org/topics/loading-multiple-images/24878/msg/212218/view.html#msg212218

A big point of the AudioMixer is to funnel all lines into a single output, so there is a certain logic that there should be just one.

Several things need to occur within the AudioMixer. One is that the Client may specify a sound card. (But I don’t foresee a use where one has two of these talking to two different sound cards.) The second is that the Client or the Programmer may wish to specify a buffer size or latency. In this case, the info is also directly pertinent to the various MixerLines. Differing latencies may be needed if one mix is complex and needs a bit more processing vs one which must run at a lower latency.

The only other thing that could be different between two AudioMixers would be if they were used at two different master volumes. (One could possibly accomplish the same thing by coming up with a way to link “banks” of MixerTracks.) Or, one might want to “cue up” an AudioMixer, shut down the current and immediately start the next one, in order to effect an abrupt and quick transition between two complex audio environments. In the latter case, one WOULD want the ability to specify which AudioMixer when setting up new MixerTracks. So, I have some hesitation about making, for example, the fetch of the buffer size a static method.

So…I am leaning towards keeping the AudioMixer a regular class with no static methods. Until the usage becomes clearer and I see a more definite need or justification, I think it is better to stick with the more flexible option. However, I think I will go ahead and make a change and instantiate the MixerTrack with it’s parent AudioMixer as a parameter rather than the AudioMixer’s buffer size, as is now being done.

(The next topic I will ponder will be the idea of creating a Sound-Only thread. I’ve not done this with threads before, and will have to weigh the overhead of managing a separate thread vs what it truly costs to lock a single, very simple command, and what it costs to clone a small Set.)

Personally, I think singletons have their place, but are often overused. It could make for a simpler API if you are absolutely sure you only want to support one mixer at a time - RAPL in Praxis doesn’t use singletons for this mind you.

My reason for bringing up singleton, as opposed to what I thought you were suggesting with “static class”, was more in mind of using it with the factory method pattern. The few times I use singletons I tend to do that so that the actual implementation used can be defined at runtime.

The single threaded model (per AudioMixer) will still work whether you go for a single AudioMixer or not though - AudioMixer.invokeLater() becomes mixer.invokeLater()

If you have a direct correlation between your JavaSound audio line and your mixer, you’ll probably find atomically removing all old MixerTracks and adding in the new ones will be an order of magnitude faster than shutting down and restarting an audio line (not to mention that as you found with the theremin, starting a line often has an audible buzz to it).

On use of float range -32767 to 32767 vs -1 to 1.

[quote]btw - there is no difference in precision between -1:1 and -32767:32767 - the multiply and divide may add a little (inaudible) imprecision. However, I have a feeling you’ll get more imprecision from not having the decimal point in the same place doing multiplication, such as in a gain (1.0 * 1.0 vs 1.0 * 32767.0).
[/quote]
Maybe I am misunderstanding what is meant by precision.
Since the range -32767 to 32767 includes -1 to 1, wouldn’t it have a factor of something like 64,000 times more possible values that can be represented within its bounds? I think I also saw a definition of floats that actually had the range -1 to 1 being slightly less densely populated than other parts of the number range. But my memory is bad on this, and I don’t know the exact float definition implemented by Java.

The point about multiplications errors being greater seems plausible though. It’s been a long time since I studied error propagation. I suppose it could be tested. In any event, I thought it might be okay to leave out the normalization step since it does have a cost. Using precision as justification is maybe moot, since -1 to 1 is commonly held to be sufficient for encoding audio data.

Audio in its own thread

I am still pondering this suggestion.

Audio DOES occur on its own thread. The main audio mixing loop runs on its own runnable.

You (Neil) are saying, if I understand correctly, any information or commands to be read by any component of the audio should be queued on a thread (or Executor Service?) similar to the Swing EDT? I’m not clear what will be gained this way. I assume this separate thread does NOT include the main audio mixing loop, as that operation will be monopolizing its thread for the duration. Also, if the JVM can concurrently have several values passed to the various audio processing routines, isn’t that a good thing? For example volume changes to both a wav and a couple clips, why force it to be sequential if it can be concurrent?

As long as the events are passed in a non-blocking fashion, is there any other concern? (I AM attempting to read chapter 15: “Atomic Variables and Nonblocking Synchronization” from Goetz “Java Concurrency in Practice”–though it is a tough read.)

I do think making a programmer have to add invokeLaters is a bit of a nuisance and is to be avoided in a “simple” audio mixer program, unless there is a direct benefit. Maybe I will run into the brick wall as I progress with the project, and find out that this helps avoid some sort of blocking. But the ConcurrentLinkedQueue idea looks very promising, for example, for sending volume updates.

But on that matter, I think there is also this: some of the audio data is read from file locations. Other data might come from someone who wants to play compressed data. (For example, I plan to implement a TDLTrack, so anyone with a TargetDataLine implementation can have it directed into the AudioMixer. This seems like the most realistic possibility to allow someone to get their OGG/Vorbis files played, as I sure as heck am not touching those decoders again if I can help it. Converting an OGG/Vorbis SourceDataLine output to TargetDataLine is something someone else with motivation can tackle.) These things are also going to slow down performance.

Going back to Riven’s suggestion: something with 5 or 6 tracks could be quite useful. And if some of those lines (like the ClipTrack) allow overlapping play without complicating the api, that can give the illusion of adding quite a few more tracks.

No, that’s where you’re misinterpreting me. The queue should be drained and the various Runnables (if any) run on your main audio thread - the exact same thread that is running your audio mixing loop. It’s similar to the various graphics loops around here that call update() then render().

The amount of work you’ll be doing in each Runnable you pass in is minimal. The overhead of thread switching will vastly outweigh any benefit you get from making it parallel. The other gains you can make are huge too. You remove any need for thread synchronization in any other bit of your code, by having all the thread safe code in one place you can easily swap in and out alternative implementations, and as I’ve mentioned before, it’s the only way you can update multiple things (play multiple sounds, control multiple volumes, etc.) in one go.

Because these things don’t make much sense happening concurrently - they need to happen at a particular time in the audio stream. It would be like a MIDI sequencer being incapable of playing chords! (actually, if you know how MIDI works, that might not be the best analogy :wink: , but hopefully you get my drift).

There is a rather weird reversal of meaning in here, whereby coding this sequentially is what actually allows the sounds themselves to be (audibly) concurrent.

To turn your question on its head - why force it to be concurrent if it can be sequential? Concurrency is more complicated for both the computer and for you. Use it only where there’s a proven need, such as not loading sound files in your audio thread.

It’s funny, I’ve been giving this a bit of thought myself over the last few weeks - I was planning on creating a simple mixer API on top of RAPL to share around here at some point. I personally wondered about the problem of making programmers call invokeLater() too, but then when you use Swing you have to do that all the time. As mentioned above, the invokeLater() is the best way of enabling events to happen at the same moment in the audio stream.

If you wanted to give the option of calling from another thread, you could also use the principle in the first piece of code I posted in HashSet reply#9. I’ve seen that in a few Swing related libraries.

Best wishes, Neil

[quote]No, that’s where you’re misinterpreting me. The queue should be drained and the various Runnables (if any) run on your main audio thread - the exact same thread that is running your audio mixing loop. It’s similar to the various graphics loops around here that call update() then render().
[/quote]
OMG, really?
But…but…

This limits the granularity of events to the size of the buffer, yes? Well, I guess that is not entirely true. The whole point of the RTESmoother was to dole out events per sample, even though received on an occasional basis. I suppose that “occasion” can be in between mixer reads.

But…but…

I’ll have to separate out the methods that open and close the SourceDataLine from the buffer reads. I guess that should be possible. I’ve never seen one implementation of this, so there is a bit of newness/strangeness inhibition. But OK.

Doesn’t this go against the whole “time waits for nobody” gist, to put pauses in the audio thread? It IS putting pauses, because there will be mixer tracks ready and waiting for the next read. So the question is if the savings in contention avoided compensate.

This will limit events to the granularity of the buffer, which is a good reason to get your buffer size down, or to split the external soundcard buffersize into multiple internal buffers. There are (complicated) ways of achieving sample accuracy, but they won’t lower your latency from what I’m suggesting. Let me try and explain with some bad ascii art

|---------------|---------------|
B1 e B2 B3

Assume ‘e’ is your audio event. It arrives at a moment in the middle of the playback of buffer B1. B1 has already been processed and sent to the soundcard. Whichever way you use - your locking code or my queuing idea - the earliest time that ‘e’ can occur in the audio stream is at the beginning of B2. To get ‘e’ to actually occur where it is marked, you would have to queue it up prior to B1 being processed.

Say ‘e’ is meant to happen 100 samples into B1, then to get it to occur there you would have to queue it up prior to B1 being processed. Though sample accurate, this is now 100 samples later than the earliest time it could have occurred - at B1 sample 1.

All in all, this is a complicated way of saying that the suggestion I made doesn’t lead to any extra delay - events occur at the earliest possible moment they could occur.

For the purpose you have, I wouldn’t even consider sample accuracy. Lots of audio systems, particularly realtime ones, don’t work with sample accuracy. If you really want to pursue this then you can achieve this either in the individual mixer tracks (events take a timecode and the component starts the audio the relevant number of samples in) or some systems use a variable internal buffer size, working out how many samples there are until the next event, and processing a buffer of that size. All of these add an extra delay though!

no … lost me there … can you explain what you mean.

No, it ISN’T putting pauses. :slight_smile: As I hope I’ve explained above, all your events will happen at the earliest possible moment. This is all what “time waits for nobody” is about.

Best wishes, Neil

I’ve put in updates to the AudioMixer, making use of the CopyOnWriteArrayList to try and minimize contention time. I’m still debating the idea of making a single audio thread!

The api is simpler. When you create a track, you pass a reference to the audioMixer to the Track. Then, all you have to do is track.start() if it is reading a file, and track.start() followed by track.play() if it is a clip-based track. [The different uses of “start” is annoying, though. I will continue to ponder this detail. Maybe eliminate start() and just have a play() for both.]

@snigma: Agreed. The buffer latency is there with my RTESmoother approach as well.

[quote]Quote from: philfrei on 2 hours ago
“I’ll have to separate out the methods that open and close the SourceDataLine from the buffer reads. I guess that should be possible. I’ve never seen one implementation of this, so there is a bit of newness/strangeness inhibition. But OK.”

no … lost me there … can you explain what you mean.
[/quote]
In the AudioMixer is a runnable class. The run() of MixerPlay (maybe I should rename to AudioMixerPlayer or AudioMixerRunnable) creates a new SourceDataLine and opens it. This same run() method has the audio loop that does the mixing. And, there is a bit of cleanup at the end. The change you suggest would be to make a method that only does a single buffer load, yes?

[quote]No, it ISN’T putting pauses. Smiley As I hope I’ve explained above, all your events will happen at the earliest possible moment. This is all what “time waits for nobody” is about.
[/quote]
I understand that Events such as volume changes or wavetable readrate changes will occur as soon as they possibly can. You are putting cpu in between the single-buffer method calls to handle these events. What I have to verify for myself is if the time spent is gained back because the single-buffer method call running more quickly than they would if they had to deal with the event processing themselves (even if done via some sort of non-blocking manner, if this is possible). :stuck_out_tongue:

@nsigma -

I’m still looking into the single audio thread idea. It IS looking like a better idea as I learn more about it.

Would you implement it via “newSingleThreadExecutor”?

Is it possible to limit the extent the programmer needs to use invokeLaters in the api by having methods that handle this for them?

Worry: the JVM sees the audioloop calling a series of singlebufferload methods, and decides to get a little ahead of the game and schedules several of them in a row. Is this something that could cause the events that we want to schedule to have varying amounts of lag? Is this something that can be avoided by NOT using thread confinement?

Another “unclear” notion: a ConcurrentLinkedQueue is “weakly consistent”. Does that mean some of the add or removes might make it into an iteration even after it has started? Is that a potential pickup in processing? (And would only occur under the concurrent scenario, not the thread confinement scenario?)

I AM thinking or planning to make my events arrive via a single client, regardless. This would be an intermediate form of thread confinement. Not sure if you considered intermediate possibilities.

IF you ARE doing thread confinement, doesn’t that mean you no longer need to use ConcurrentLinkedQueue or LinkedBlockingQueue for your event handling? One downside of these methods is that they require Objects, they can’t hold primitives. But if there is full thread confinement, maybe you can get away with passing primitive parameters. Yes? There IS a cost of making and destroying objects just to be able to use a fancy Queue when an array or two with a start and end pointer might suffice. I was starting to explore this in the RTESmoother.


Post #7 uses the LinkedBlockingQueue and an object I call RealTimeEvent. #8 ditches both and just uses two arrays and a couple indexes, and passes primitives. It avoids the overhead of creating and destroying RealTimeEvents. BUT I have to say, I have not gotten into this far enough to do metrics to determine what is actually being saved or not. With the JTheremin, there is a steady stream of them being created by the MouseMotionListener.

One more thought about a thread confinement implementation. It seems to me that there would be two kinds of events on the single audio thread: call them “A events” and “B events”. “A events” consist of the audioloop processing a single buffer. “B events” consist of all the updating of parameters like volume or cursor rates or other controls. “A-events” clearly benefit from thread restriction, but “B-events” are presumably all heading to different mixer tracks or different mixer track controls. There could be a possibility of a huge forking of the “B events” to try and pick up some cpus in between the “A-events”. I don’t know how hard that would be to program, though. But it seems like an intriguing idea.

Just some musing as I try to grapple with new concepts…

Quick response - busy day! Actually, having got to the end - not so quick! :slight_smile:

Yes! There’s a reason the vast majority of audio software is written like that. :wink:

No. You need to manage the scheduling yourself, which is a lot easier anyway.

In your AudioMixer class (actually in the MixerPlayer inner class) you want to insert something similar to the following around line 122


while(playing) {

  Runnable r;
  while ( (r = queue.poll()) != null ) {
    r.run();     
  }

  float[] audioSumR = new float[bufferSizeInSamples];
  float[] audioSumL = new float[bufferSizeInSamples];

  // ......
}

See, easy! ;D

And while I think about it, try to pull those array allocations out of your while loop too - you don’t want to be allocating buffers on every pass through.

Yes, as I suggested in HashSet reply #9 linked above. However, don’t hide the invokeLater() if you do that. As I’ve already said - if you used the following code from your GUI thread -


sound1.play();
sound2.play();
sound3.play();
sound4.play();

Then those play events will be queued separately, and may not play in sync. To get them to happen together you need to wrap them in a single Runnable and post to invokeLater(). Therefore, you need to keep that ability in for more complex use cases. The code from #9 handles the ability to call from in or out of the audio thread - you’ll need to include code to check Thread.currentThread() against your audio thread in the mixer.

Firstly, stop worrying and get coding! :stuck_out_tongue: The poll() loop above will ensure the queue is drained on every audio buffer anyway. There is the potential for events to be late because the GUI thread is held up by the higher priority audio thread, but that is the behaviour you want otherwise the audio will break up.

The part of ConcurrentLinkedQueue that is weakly consistent is the iterator. You do not want to use the iterator with a queue - you want to loop through using poll() as above.

Not entirely sure what you mean by thread confinement in this context???

I really wouldn’t worry about the object creation here. Object creation in Java is many times faster than malloc in C, and it’s all happening on non-audio threads anyway - the audio thread is just draining the queue. Trying to write your own FIFO code that uses primitives is not only premature optimization, but also remember that the code in JDK has been written and optimized by a number of threading experts - you could easily end up writing something that actually performs worse.

Best wishes, Neil

Thanks again, Neil. BTW, I have been busy coding. Just not this aspect!

In working with a practical application of this code, I decided to make the MixerTrack abstract and to make a major split between implimentations which are continuous and those which are triggered. The respective interfaces are “ContinuousMixerTrack” and “TriggeredMixerTrack”.

I feel like the differences are big enough and clear enough to merit the split. Triggered sounds just need a “play” and will stop themselves. There is no need to encumber them with the obligation to overwrite start() and stop(). I’m also looking forward to using Envelopes with Triggered sounds. A continuous sound shouldn’t have to overwrite triggering methods or anything involving envelopes.

It’s all posted in the first three posts above, if anyone cares to jump in and give it a go. Meanwhile, I continue to work towards a web demo, and there will probably be more tweaks along the way.

The “single audio thread” seems like a very good idea, and I really do appreciate the suggestion! But I am going to put off implementing it for two reasons: (1) the response seems pretty good already, so I’m going to make a judgment call and say this is “optimization” and thus a lower priority than keeping moving other aspects of the api, (2) it seems like it shouldn’t be too difficult to refactor to the single audio thread form when the time comes to work on optimization.