simple audio mixer, 2nd pass

[EDIT: jar with code is here: http://hexara.com/pfaudio121212.jar
Notes on usage are on post #5 of this thread.]

I’ve restarted the project of putting together a simple sterep audio mixer. So far, it supports volume and panning settings for wavs and clips. Output is to a single stereo SourceDataLine.

My hope is that if this java code sits on top of something like libgdx, it can route this line to whatever it is that Android supports for playback, making it look to Android like a single outgoing wav or something. (Haven’t tested this yet. Am in process of learning my way around Linux–have Ubuntu now–and plan to set up an Android emulator there in the next week or two.)

Unlike the previous version, this one never processes a frame (sample) of sound from a track unless it is the current frame. I iterate across the tracks and only read one sound sample from each, rather than a buffer’s worth. There were doubts expressed that this would cause all sorts of performance problems, but tests seem to indicate all is fine. Last night, I ran 16 wav files simultaneously, and in another, 32 simultaneous clips all at different pan positions. No problems to report.

I basically have a wav wrapper and a clip-type wrapper supported so far. The clip is in two parts: a class that stores the clip data in RAM, and another that manages a set of cursors for multiple playback. There’s a nifty non-blocking queue used for storing cursors that are ready to play (when they finish, they are re-entered into the queue automatically).

clip.play(speed, volume, pan); // of course, there is some setup first

speed is a multiplier–for example 2 will play the sound twice as fast, 0.75 will slow it down some (no I didn’t support negatives–but it should be quite easy to add, actually, just a matter of adjusting the start point to the end of the clip!)
volume goes from 0 to 1, a multiplier. (I plan to add a VolumeMapping function so that 0.5 actually sounds like it is at half volume.)
pan goes from -1 to 1, with 0 as center.

All the tests have been using Thread.sleep() increments to space things out and the response is pretty good, probably fine for most game applications. There is a bit of variability that is probably directly related to the size of the cpu slices. I wouldn’t want to use it for reading a musical score that requires playing a series of clips in perfect time. I just started working on an event-reader that is accurate to the frame (e.g., 1/44100th of a second). Will report progress on that. It makes use of nsigma’s advice to handle these events in a single audio thread shared with the mixer, to avoid blocking problems.

No support for ogg or mp3. If you want to load an ogg or mp3 into RAM though, for clip playback, that should be easy to add.

Biggest drawback is perhaps that I haven’t written in the ability to add or drop tracks yet while the mixer is running. Currently,the mixer iterates through all tracks for each frame, skipping those that are not “running”. But once the audio-event-reader works, it should be doable to add this as part of that process.

Reinventing the wheel, again. There are a lot of great audio tools already in existence! But I want something for my games and that can play my Java FM synth sounds, and I want to learn about audio programming.

Please can you remind me where is your source code?

@gouessej

Not posted yet. I only got it working last night, and probably should do a bit of cleanup to make it more presentable. Also, is just having the wav & clip wrappers and no volume maps and need to turn the mixer off to add/delete tracks (you can stop and start existing tracks dynamically) sufficient to be useful?

It would be awesome if you were interested in testing/trying out what I have. I could maybe put together a jar by Monday. (I work all day tomorrow, and have a concert to attend in the evening where a new composition of mine is going to be played!)

Also, no audio-event-handler yet. First step: I now have a PriorityBlockingQueue that holds crude AudioMixerEvents, but the only thing that happens so far is that if an event’s frameTime matches the frame being processed, it prints “click” and the time and the frame #. It’s not set up to, say, play a clip track yet. Have to figure out how to set that up. Probably next week unless a simple solution pops into my head out of nowhere.

So all cuing right now is real time commands to the clip, independent of the mixer thread.

Have you tried it with OpenJDK?

I haven’t loaded OpenJDK yet, nor LibGDX or LWJGL.

Which do you suggest as a best first try for integration with the new mixer?

As for progress on the mixer, I was working on the javadoc and have one more class to build, to allow clips to loop, when the 6-months old graphics card blew. Just got computer back from the shop. Hoping to get the jar up for tryouts soon.

I’ve posted a jar with the “Per Frame Audio Mixer” at the following location:
http://hexara.com/pfaudio121206.jar

The source is included, under GPL for now, and also includes a single .wav file of a bell that I have used in several test/demo examples (and takes up 99% of the jar). I made an attempt at creating JavaDoc for all the public methods–hopefully it will be helpful.

“Per frame” means per audio frame, or audio sample. The mixer only reads one frame (sample) at a time from the mixer tracks.

Click the jar to hear the sample tests:

  1. single PFWavTrack playback of the a4.wav (a bell)

  2. backwards PFClipShooter playback of the same bell

  3. 32 bells spread out, left to right, with pitch variation, spaced by repeated Thread.sleep(100) commands. The timing is not shabby, though not dead accurate enough for real music system. That will come once the “per frame audio event queue” is implemented.

  4. random bells (some backwards) via one ClipShooter track, two low bells via a second ClipShooter track.

  5. Combo of (a) ClipLooper of the bell (pan center), where the ends of the file are overlapped and cross-faded (creating a potentially infinitely long ringing) with (b) ClipShooter of a low bell (left) and some high bells that iterate from right to left, each with a smaller volume. At the end, the volume of the ClipLooper is brought down via a series of 0.005 volume increments spaced 10 millis apart.

How to use the code:

//initilize

PFAudioMixer audioMixer = new PFAudioMixer();  // initializes the mixer

PFWavTrack wavTrack = new PFWavTrack( fileName ); // initializes the wav wrapper track

PFClipData clipData = new PFClipData( fileName ); // loads the clip data 
PFClipShooter clipShooter = new PFClipShooter( clipData, maxShots); // initializes a multiple clip playback track
PFClipLooper clipLooper = new PFClipLooper( clipData ); // initializes a clip-looping playback track 


// load tracks into mixer

audioMixer.add( wavTrack );
audioMixer.add( clipShooter );
audioMixer.add( clipLooper );


// start the mixer

audioMixer.start();


// trigger when ready

wavTrack.play(volume, pan); // NOTE: can only play once
clipShooter.play( speed, volume, pan );
clipLooper.play( volume, pan );


// adjust volume or pan

whateverTrack.setVolume( newVolume );  // range: 0 to 1f
whateverTrack.setPan( newPan );  // -1f to 1f, 0 is center


// if you want to delete a track:

audioMixer.stop();
audioMixer.remove( whateverTrack );


NOTES:

Only one format currently supported: .wav, 44100fps, 16-bit, stereo, little endian signed PCM

Mono NOT supported–you have to mix your mono to stereo for use in this system. Mono for some clips would be a good thing, and I hope to get to it before too long.

Caution: No checking for overflow!! If you exceed 16 bits capacity, you will hear a really ugly loud noise. When that happens, figure out which of your files to play at a lower volume. This can be worked out prior to shipping your game.

Volume and pan changes that are “too large” will cause clicks. I recommend making a max volume change of maybe 0.005 at a time. Not sure what the pan tolerances are yet.

Volume does NOT map well to dynamics as we hear them. Coming: some volume maps so that volume changes are more evenly distributed between 0 and 1. Pan seems to behave better, so I’m not sure if/when I’ll bother with working out a mapping for it.

For some reason, I can play 16 PFWavTracks within the Eclipse IDE, but I can’t play even 2 as a stand-along jar. I do not understand this, but perhaps it has something to do with the audio files being compressed, and thus slower to read? For now, I recommend sticking with sounds you can load into memory (clips).

I haven’t tested how many PFClip… tracks can run concurrently yet.

COMING: an event queue within the audio mixer thread, for audio-frame-specific events. That means there will be accuracy up to 1/44100th of a second, if you know precisely during which audio-frame you want to the event to occur. Also coming: various tools for constructing sound environments using paramatized randomness (e.g., windchimes), and some nice FM Synth sounds, procedurally generated. (Those are my ambitions.)

Looks interesting. For your volume control, this is quite an interesting article - http://www.dr-lex.be/info-stuff/volumecontrols.html Or to summarise, volume should be logarithmic not linear to better suit how we hear. A usable approximation is to use the power of 4 of the volume setting passed into the setVolume() method. You’ll probably need to relook at your panning algorithm too - google for equal power panning.

I’m not sure I understand your reasoning for processing this on a per-sample basis. It might work OK, but it is still an inefficient way to do this. You’ve basically got the JavaSound buffer being filled as quickly as possible every 1/20 of a second (your 1/5 in the comments is wrong - each sample frame is 4 bytes). Therefore, you can only post events into your sample accurate queue every 1/20 second anyway, really. Internally you can use arrays of floats as buffers, but you don’t have to process all the buffer in one go - at the beginning of each cycle, order the events, work out how many sample to the first one, and process that many samples, then to the next event, etc. Hope that makes sense.

@nsigma writes:

[quote]Looks interesting. For your volume control, this is quite an interesting article - http://www.dr-lex.be/info-stuff/volumecontrols.html Or to summarise, volume should be logarithmic not linear to better suit how we hear. A usable approximation is to use the power of 4 of the volume setting passed into the setVolume() method.
[/quote]
I’ll check it out. I’ve been experimenting with exponential/logarithmic/trig volume mapping algorithms, and used them for synth envelopes, e.g., the FM “SpiderBell” and found them quite helpful. It just didn’t make this iteration because I didn’t want to complicate the api, and it is something that can well be implemented at the ‘trigger’ end–external to the mixer.

[quote]You’ll probably need to relook at your panning algorithm too - google for equal power panning.
[/quote]
Yes, this is something I haven’t addressed yet, and having the search terms is very helpful. But I still think that the simple algorithm being used may often suffice. To my ear, the perceptual distortion is not as in-your-face as the distortion of the volumes.

[quote]I’m not sure I understand your reasoning for processing this on a per-sample basis. It might work OK, but it is still an inefficient way to do this. You’ve basically got the JavaSound buffer being filled as quickly as possible every 1/20 of a second (your 1/5 in the comments is wrong - each sample frame is 4 bytes). Therefore, you can only post events into your sample accurate queue every 1/20 second anyway, really.
[/quote]
Ahh, got it. The output buffer is set to 8192 bytes, but yes, that does come to a much smaller latency given 4 bytes per frame. My oops.

OK, here is my reasoning, and I welcome your giving it a going over, shooting it down if it is faulty!

I’m seeing latency as a multiple stage problem, the sum of various contributors. I’m taking the view that there are three latencies in a mixer: the read, the processing, and the write. There is also the inherent latency or real-time variability/unpredictability caused by JVM switching.

I don’t see any way to get around reducing the write latency–that’s currently set by the 8192 byte arrays being sent to the SourceDataLine.

However, by making the read and the processing be single sound frames, the data reaching the audio thread is getting there at the earliest possible frame, limited to the JVM thread switching timing constraints.

Example: suppose we have 4 tracks and an input read buffer the same size as our write buffer (1/20th of a second). I am assuming the audio thread can be interrupted by the JVM at any time. Suppose a “play” command originates from the GUI thread and the JVM switches after processing two of the four mixer tracks, making a volatile “running” boolean change to TRUE. This will now be visible to the mixer track as soon as the JVM switches back. If the mixer track in question is one of the two that have already been processed, the earliest the “running = TRUE” will take effect is after the end of this block of time being processed and the next block is initiated. But if the “block” being processed is a single frame, it will be processed during the very next frame.

Yes? No?

The relevant question, it seems to me, is just how much we slow down the efficiency of the audio processing by using this admittedly odd method. (Golden rule being violated: It is always better to grab and use contiguous blocks of data.) I am not able to measure that, and I’ve been repeatedly finding Java making a fool of me when I try to optimize. So I took the plunge and tried this method to see/hear what would happen. It seems to me that it kind of works and is worth further investigation.

I’m willing to sacrifice a bit of processing efficiency if it makes the “real-time” behavior a little tighter. The question is how much.

[quote] Internally you can use arrays of floats as buffers, but you don’t have to process all the buffer in one go - at the beginning of each cycle, order the events, work out how many sample to the first one, and process that many samples, then to the next event, etc. Hope that makes sense.
[/quote]
This makes perfect sense and is one of the plans I have been considering for audio event-queue processing. But before locking in an event-queue cycle as a component of the built-in latency, as well as adding the complications and costs of processing varying blocks of data in the read and processing stages, I wanted to try this “per frame” algorithm.

Possible danger of using the event queue cycle: the events don’t necessarily materialize in the real-time order of occurrence. A late-arriving event could have it’s “real-time” mismatch compounded by just missing an event-queue cycle.


There’s a fair bit to be done to test if this api is practical or not. I’m not clear yet if it works to require the mixer be “off” when adding or deleting tracks. Also, have to test what happens if you attempt 20 or 30 tracks for a complicated level. Even if only a half dozen are playing at a given time, touching each one via a per-frame algorithm might balloon the inherent inefficiencies and cpu cost of this method.

Thanks as always for the feedback!

The major problem I see with your approach is that you’re assuming the buffer processing is more spread out than it probably is, whereas it is more likely being processed in one quick burst. I think the majority of the time, your buffer will be processed completely before a context shift (particularly as your audio thread should be highest priority), though on a multicore system you’ll be adding in more variability.

Therefore, most of the time your latency is going to be your buffer size. What you’re method doesn’t take account for, and possibly makes worse, is jitter. As you’re probably aware of from using MIDI, a small but constant delay in triggering feels more natural than a varying one. One approach you could take is to timestamp your events with System.nanoTime(). Then process the events in your audio thread roughly 1/20th of a second behind - the aim is to schedule them sample accurate as close to one buffer time after the event was triggered. You could look into the source code of projects like Frinika or Gervill which take a somewhat similar approach to sample accuracy (from recollection).

Well, I just wanted to throw out any assumptions about buffer processing.

I recall testing two buffer sizes by printing nano times with each call, and surprise, surprise, the JVM simply allowed the version with a smaller buffer to be called more times in sequence before it switched–the switching durations were fairly consistent between the two versions.

I didn’t think to put a second call at the end of the method, to determine whether the JVM ever switched at a point in the middle of this method! (One can peruse the gaps in the time stamps.)

Jitter is a good term. When I referred to the buffer processing loop as having three stages, perhaps the combination of stages contributes more to jitter than latency.

Big yes on time-stamping! That is an essential element of the event-queue that I wrote for the Theremin and have in progress for this mixing system. Also, one can use the moment that the audio mixer first goes on to create its first audio frame “The Epoch”, so-to-speak, and use it to cross calculate between animation frames, sound frames/samples and real time.

Yes. Though, as I said earlier, you have to run the audio queue at a time somewhat behind system time to compensate for the buffer latency, otherwise every event you process will be in the past! Also beware of another cause of jitter in calculating between system time and sample time - the sound card clock and system clock will drift from each other, and there will also be variance between the system time when processing is called and the expected sample time. I’ve recently been playing with a port of a DLL (delay locked loop) to filter variance between system time and sample time. The code is here if you’re interested - http://code.google.com/p/jaudiolibs/source/browse/?repo=audioservers#git%2Faudioservers-javasound%2Fsrc%2Fmain%2Fjava%2Forg%2Fjaudiolibs%2Faudioservers%2Fjavasound

Hi

I had to run it in command line as Ark tried to open it when I double-clicked on it (this is the expected behaviour).

It works fine but I have a little bit crackling or sizzling.

Thanks for trying it. I am assuming you are running OpenJDK, is that right?

Today I learned about $PATH in Linux, and learned how to make a link to allow calls to javac. That shows what a beginner I am with Linux–but one has to start somewhere. I also now have both OpenJDK 7 and Oracle’s Java 7 on the Linux partition.

But I can’t test my sound programs yet, because the Linux is not recognizing the Windows sound card that I have installed…so that has to be solved soon.

I can’t recall what “crackle” tends to imply diagnostically. Does it happen on all the sound tests?

It could be defaulting to a less efficient java audio implementation than what you normally use. (Is the Java Sound Audio Engine involved in any way in your setup?) At this point, I call standard library sound code, e.g., javax.sound.sampled.SourceDataLine.

I would think if the problem were the execution speed of the code, we’d more likely hear dropouts. Perhaps my volume settings on the test code are too high. There also might be some artifacts related to the pitch shifting–that could cause some sizzle. If that were true, it would only sizzle on the tests where the playback speed (pitch of the bell) is altered.

Ah, another try might be to make the buffer for the SourceDataLine writes larger. I can’t be an effective help until I get my Linux sound solved. But you are welcome to change the source, to try a larger buffer, for example. The source was included in the jar.

I use OpenJDK 1.7 update 6 and there is no Java Sound Audio Engine involved because there is no Java Sound Audio Engine in OpenJDK 1.7. I don’t know how to explain this crackling.

I’m with you - not easy to explain in words! :slight_smile:

I get a similar issue with this code, and in Praxis LIVE, during the first few seconds of opening a line. Because each of the tests opens a new line, this recurs during the demo. It seems to be the fault of the PulseAudio mixer in OpenJDK / IcedTea, which seems to be set as default in most Linux distros. The standard mixers from Java actually work much better, and because in Java 7 they’re now fixed to use the default ALSA device, they play through PulseAudio anyway - go figure!

@gouessej - does launching the JAR with the following line work better for you? Seems to for me.

java -jar -Djavax.sound.sampled.SourceDataLine=com.sun.media.sound.DirectAudioDeviceProvider pfaudio121206.jar 

[quote]@nsigma

java -jar -Djavax.sound.sampled.SourceDataLine=com.sun.media.sound.DirectAudioDeviceProvider pfaudio121206.jar 

[/quote]
Well this is new to me!

I spent a few minutes looking through the tutorials and via search to find the parameter “-D” for jars, and could find nothing. Can you explain what is happening here, or where I might find the spec or api for it?

Is there a way to do this in the source code, so that we don’t have to do it in the jar?

When I put “com.sun.media.sound.DirectAudioDeviceProvider” in my Eclipse IDE, I get the message that DirectAudioDeviceProvider is not available due to a restriction on rt.jar.

-D is a standard command line option for Java to set system properties. See here, and it’s the same on Windows. You can do the same in code by calling System.setProperty().

ie.

System.setProperty("javax.sound.sampled.SourceDataLine", "com.sun.media.sound.DirectAudioDeviceProvider");

This way just saves having to recompile your JAR to test it - I’m lazy! ;D

The system properties supported by JavaSound are documented here - http://docs.oracle.com/javase/7/docs/api/javax/sound/sampled/AudioSystem.html Using system properties overrides whatever is set in the sound.properties file in the JRE, which on a lot of OpenJDK installs seems to default to org.classpath.icedtea.pulseaudio.PulseAudioMixerProvider. You can also override these defaults system-wide by editing sound.properties.

Recompiled:
http://hexara.com/pfaudio121212.jar

Well, perhaps this will eliminate the mystery crackle.

Doh! I was looking for -D under the jar command, not the java command. Thanks for including the line of code already written. Looking at the AudioSystem spec, I was trying to figure out how to use their example with Properties.load rather than System.setProperty method.

NOW:
I think I’m going to go ahead and install this in my own game, to get a better idea of its practicality, but am debating whether to implement a “simple” audio event queue first.

But I’m also thinking: what would be a good way to test the performance cost of not using a larger input/processing buffer? It would be nice to have some sort of measure of the tradeoff I am making by processing single audio frames.

It is easy to write a second version with a buffer. But how can they be compared meaningfully? I’m concerned that the accuracy will be distorted by the fact that writing to a SourceDataLine blocks.

Ah, I could write to some array (sized to the buffer size, and nonblocking) instead of a SourceDataLine! Then, let both run unhindered as fast as they can go. Will report back on this when I get a chance to run it.

Alternate suggestions for testing happily accepted!

I was hoping Julien might get back with a report on that option. I wouldn’t hard-wire it into your code until you know whether it’s a better option across the board!

Well, you could have a look at the JavaSound implementation in the JAudioLibs code - http://code.google.com/p/jaudiolibs/source/browse/?repo=audioservers. This offers two alternative timing mechanisms that don’t block on the write to the SDL - instead using a large output buffer but never writing to all of it, and using either System.nanoTime() or getFramePosition() to control writes. NB. the getFramePosition() option doesn’t really work on Windows.

In Praxis I further split buffers from the server into 64 sample chunks to process through the audio pipeline. This seems to be solid everywhere I’ve tested so far, and offers a fairly good compromise to sample accurate processing.

If by “let both run unhindered as fast as they can go” you mean running multiple threads then forget it. Thread contention will just interfere. Run everything off the primary audio thread, and make sure it’s set to maximum priority too!

It doesn’t fix the crackle.