Is a Binaural Sound Engine in Java possible?

BlueMustache · December 26, 2015, 6:38am

Hello Community,
Hope you had a great Christmas!
I have an odd ball question for ya’ll today.
So, I plan on making like a hostage simulator / storyline kinda thing.
This would be more of a pet project of mine. I have a kinect, and I am gonna track my head movements with it.
I want to be able to emulate if I was blindfolded with headphones, that I was next to someone in a chair.
Say, I am sitting there, and I here someone uttering dialogue to my left, then someone to my right, and then footsteps panning in front of me.
I also of course could adjust this pan to match my head movement.
Anyways, my big question is how could I implement this in Java?

https://ircam-rnd.github.io/binauralFIR/examples/

I have seen a couple Unity examples, but that is a plugin. I don’t know C++ or C#. I want to implement this in LWJGL or just a plain swing panel.
I want to be able to tell a story, much like old radio programs used to use tons of effects, sounds, and dialog.
Please help, anything is useful.
Thanks Guys!
-Blue

Riven · December 26, 2015, 9:10am

The secret of positional sound is not so much playing the audio louder in one speaker, but playing it earlier. Just like IRL the sound waves reach one ear earlier than the other (usually with a difference of less than a millisecond). You brain will pick up on that and do the ‘math’ for you. Additionally you can play with the volumes of both streams, but that should take a backseat.

Catharsis · December 26, 2015, 9:12am

Sure… Indirectly at least. I’d bundle your game with SuperCollider and punt all audio processing to an instance of SC Server running and interface with it over OSC / the network essentially. This is how TyphonRT started in '03 to create significantly more expressive GUIs for SC, but I also experimented with creating a game engine and having SC run surround sound via Ambisonics (which can also be delivered as binaural simultaneously) a while back. You could even get super fancy and get a Soundfield microphone and blow people away by mixing real life 3D audio w/ synthesized aspects as well. Since SC has stable versions of the audio server for all desktop OSes this is a realistic option. Java itself… Yah no… Sort of I guess… At least that is how I’d go about doing something detailed with audio for a real time game.

BlueMustache · December 26, 2015, 6:26pm

Thanks, that’s helpful!
Yea, I might switch to C++ or something. Of course I think I saw that SuperCollider is js. It’s somewhat easy to port code from js to Java. I did it once with a sonic communication library. Got the encoder working. The decoder will be another story.
Thanks guys!

Spasi · December 26, 2015, 6:48pm

LWJGL 3 comes with OpenAL Soft which supports binaural audio rendering. The nightly build also has support for the SOFT_HRTF extension.

KevinWorkman · December 26, 2015, 8:01pm

[quote=“BlueMustache,post:1,topic:56155”]
Short answer: Yup, and google is your friend. Have you looked into any Java sound libraries at all yet?

I know that the minim sound library supports this. I’m pretty sure libGDX does as well.

philfrei · December 27, 2015, 2:56am

I think Riven is mostly, but not completely right about the timing being a source of binaural location. This is the most important component for low to middle frequencies. As you get to wavelengths that are smaller than the size of your head, though, amplitude becomes increasingly more important, especially for steady state high sounds. This is what I remember from when I worked at a work-study lab assistant at a binaural lab at UC Berkeley back in the 1980’s.

Another consideration is the frequency content. As sounds travel larger distances through air, the high frequency components die out quicker than the low components. You can significantly enhance the effect of distance by doing some low pass filtering.

Yet another consideration is that our ear shapes tend to “color” sound in a subtle way, depending on the direction of approach, and this can also help with correlating an incoming sound with its source.

Can Java handle this? I think so. I am trying to do so. On the “easy” side, when mixing sounds, one can make use of stereo PCM coding. It’s not at all hard to take a sound value from some source and multiply it by, say 0.4 for the right and 0.6 for the left and have it sound like it is some degree towards the left. If you want to play with the timings, then it is mostly a matter of creating an array to use as a holding area and cursor through it with linear interpolation if you want to get smooth variations (smoother than that which can be done at 44100 fps increments). I’ve created arrays such as this and used them for echo and/or flanging effects. Easy to do, relatively.

I have a thread where I am showing java audio programs as I write them, with a couple sample programs which you can download and hear for yourself. In the first (CirclesDemo) there are synthesized musical motifs that are played with a panning setting that correlates to a ball’s location off of the center axis, and a volume that correlates to the distance from the center. Six sound sources (all generated in real time) are shown, moving about the screen, where the position data is sent in real time to the mixer. In that demo, I’m only using volumes to create the binaural effect. I guess I should consider putting a little timing adjustment as well. Hmmm. Interesting idea. (Might be necessary to filter out the high components before the timing adjustment in order to prevent some comb-filtering artifacts. Worth a test when I get a chance.)

I haven’t done more than the crudest of filtering so far. It seems costly in terms of CPU, but a lot of that is probably my ignorance and trepidation. Someone like Neil Smith (nsigma) has done this (and much, much, more – check out his Praxis site!) and can be of more help with that. Just give him a day or three to notice this thread.

philfrei · December 27, 2015, 3:22am

Interesting concept, using ray tracing to get real-time audio information.

Comments indicate that the effort wasn’t a complete success. Though it still seems impressive to me.

Catharsis · December 27, 2015, 4:12am

A quick note SuperCollider is not JS and the server and language (sclang) is done in C/C++. The nice thing though is that you can just use the server as a headless real time audio engine and interface with it via OSC / Open Sound Control / IE use UDP networking to localhost from the Java game engine. This has the benefit of running heavy audio loads in a separate process from any game engine which is great in the multi-core world of today. I totally can vouch that this is a solution for high quality real time audio w/ low overhead control from Java.

For simplicity though try OpenAL first especially if using LWJGL and if that doesn’t give you the result you’re looking for then consider SuperCollider. Anything beyond that will start moving down the sub-optimum direction of using Java for a complete game engine + simultaneous audio processing which is especially undesirable if sharing a single JVM instance for it. Could it work… Sure… But…

I’m sure there are other native audio engines to choose from as well with Java bindings. FMOD being the historically popular one for a lot of game dev and various Java bindings are around over the years.

nsigma · December 28, 2015, 12:03pm

@philfrei - you rang!

Despite what some others have said, Java itself is great for coding real-time audio processing, learning DSP, etc. I’ve been doing real-time Java audio work for many years now. However, unless you have a particular desire to learn about the principles underlying binaural sound, I’d agree with @Spasi about using OpenAL for this.

If you do want to learn principles, then there’s a huge range of other options, both pure-Java and native (by pure-Java there I mean the DSP code itself is in Java and can be hacked on). Having a look at libraries used with Processing is interesting, including Minim (really not sold on its per-sample UGen processing!) and Beads. Interestingly, Processing v3 has switched to a binding to the native Methcla library, which looks interesting even if the way it’s bound is a bit dubious. Alongside SuperCollider there are actual Java bindings for both libPD and CSound - JCollider unfortunately seems defunct, although there’s Overtone and ScalaCollider - why the hell anyone would prefer OSC I don’t know! :persecutioncomplex:

Trying to work out if that sentence makes any sense?! Praxis LIVE supports running audio in a separate VM, so independent process, but this to do with managing GC pauses when doing very low latency audio. I don’t see why this statement has anything to with multi-core? It’s perfectly possible to run pure-Java or native audio DSP and a game engine in the same process.

Why do so many people think Java is slow?? :emo:

Catharsis · December 28, 2015, 1:13pm

Despite what some others have said, Java itself is great for coding real-time audio processing, learning DSP, etc.

Uh… I think you misunderstood what I was specifically getting at above. If you want to play around with Java for audio go ahead. I had fun making VST plugins with Java back in the day and of course we’re talking ~'05 - '08 for that and zippy it was… Totally possible. Run wild in fact.

If you’re making a game engine especially one that is taxing in graphics, physics, AI, etc. then it’s clear splitting off audio to a separate process is potentially beneficial definitely if one is also doing heavy audio processing; like anything else measure and adjust if necessary.

why the hell anyone would prefer OSC I don’t know!

Zillion times better than MIDI having it’s start ~15+ years after the last MIDI spec and obviously much faster… Floating point resolution for control values and much more. It’s just a well defined network protocol. What do you think is better than OSC for audio control data and such?

Yeah… Too bad JCollider is GPL and in a seemingly defunct state. Indeed I have a whole Java OSC / Supercollider framework; will actually be dusting it off soon. The JCollider fellow contacted me back in the day if I was going to release my efforts before he got started. Glad to see the fellow continue w/ ScalaCollider, but all of his stuff looks to be under GPL.

Trying to work out if that sentence makes any sense?! Wink

I think you’re just pulling my leg…

I don’t see why this statement has anything to with multi-core?

Whether you launch a separate VM or have an audio engine running in a separate process the benefit is reducing the work done in main game engine / VM process. Let the OS figure it out.

nsigma · December 28, 2015, 2:42pm

No, I’m not pulling your leg. I’m trying to understand why you said running audio in a separate process is to do with multi-core? There are reasons for running audio in a separate process if it’s written in Java and you want very low latency because you can better control GC effects, etc. My environment is not a game engine, but it is that taxing, and it will run Java audio fine in process at higher but usable (in this context) latency. I’m not sure having a native bound audio engine in or out of process would make much difference?

Incidentally, the reason I personally work with Java is to support JIT coded / compiled DSP. Nothing against working with native libraries for this.

OSC is great - I currently use it for inter-process / inter-device communication. I am questioning why you’d want to code with it when you can make use of a native library with proper Java bindings. Nothing against things using it under the hood, but that wasn’t really how it came across above.

JCollider is LGPL!

Why? This is the bit I’m confused about in your statement - leaving aside Java-side GC or other VM locking, why is running code in multiple processes better than multiple threads?

Catharsis · December 28, 2015, 3:32pm

I’m not sure having a native bound audio engine in or out of process would make much difference?

Historically audio hasn’t been a deep experience for games. There is quite a bit possible pushing things forward where it would be handy as long as there is no cache thrashing; re: measure. Granted my interest is rather involved having a 32 speaker hemisphere to work with for high order Ambisonics with a bunch of multichannel convolution reverb and real time synthesis potentially running in addition to loops / one off samples. www.egrsoftware.com for picts. Now if only audio / music-tech could actually pay SF rents; :: sigh ::

If you’re doing status quo audio for a typical game scenario then there is no real benefit.

I’m trying to understand why you said running audio in a separate process is to do with multi-core?

Let’s flip this around… How is it not?

I simply was pointing out that working with SuperCollider server as a headless audio engine grants this out of the box. Take it or leave it…

I mean why do anything such as switch to Vulkan if not for the multi-core architecture possibilities (why yes with threads!).

RE: OSC - I am questioning why you’d want to code with it when you can make use of a native library with proper Java bindings.

SuperCollider… It really has been “the” most advanced open source audio engine for many years now.

JCollider is LGPL!

Well… http://www.sciss.de/jcollider/ says it’s GPL, but indeed LGPL is listed on the Github page. Most of his other work is GPL… Of course one is always free to negotiate a different license with any GPL author if there is a benefit.

why is running code in multiple processes better than multiple threads?

So the squabble is over T_O_mato / Tom_A_to? In the use case mentioned (SuperCollider) it’s just how things are…

Would you agree libraries like nanomsg are pointless given that scaling is done through sockets and multiple processes coordinating?

nsigma · December 28, 2015, 5:25pm

Yes, I’d had a look at your website already. Interesting stuff! In fact, reading your TyphonRT synopsis shows we have some fairly similar shared concerns.

Our audio interests are actually not that different either, particularly multi-channel (done a variety of projects there, and I’m the developer of the JACK bindings for Java) and real-time synthesis (eg. https://youtu.be/lK94qu1iObo?list=PL_0Ig7oegPsdCPQifT52YHXqcvFWm03Cn )

No issue with that! You seemed to be saying that running audio in a separate process was beneficial / the only way to make best use of multi-core, without really saying why processes rather than threads. It’s a more complicated way of working which can offer benefits in certain cases, sure, but not sure I’d advocate it as the go-to solution in all cases. I am genuinely interested in your reasoning.

Incidentally, I guess working with JACK is in some ways midway between those two options.

Possibly! :persecutioncomplex: I’m not sold on it being the best solution in all cases. I’m definitely not sold on it being the OP’s solution!

No, why?

philfrei · December 29, 2015, 5:43am

Uh…Mike, I’d like you to meet Neil. Neil, Mike. I believe you two have some interests in common…

Catharsis · January 22, 2016, 7:24am

Small necro here… I took a week off around the new year and didn’t get back to this… :

@nsigma Why yes… Will check out your efforts more soon! I’ll probably take a look at your gstreamer integration which I’ll likely use for initial testing my video engine efforts w/ Vulkan on the desktop. Glad to hear that someone capable is dealing with the Java bindings for Jack! ;D

I was just pointing out that using SuperCollider you get audio in a separate process “for free”. Indeed a bit more complicated per se, but not really bad. In moving my library / framework / engine work towards being highly multithreaded I favor in general protocols much more over standard APIs.

In regard to the OP and desire to have realistic audio. With using SuperCollider it would be darn neat to work with Ambisonics. In particular to “I also of course could adjust this pan to match my head movement.” One of the fantastic properties of Ambisonics is that when things are in the encoded state all one has to do is multiply the rotation matrix of the players head and this will automatically rotate the entire audio scene. No need to track individual sounds and move them around. Also Ambisonics can be decoded to binaural audio or discrete speaker arrangements.

It actually was neat to read that OpenAL Soft uses Ambisonics internally to provide better results, but this is an internal implementation detail. On a quick review of the HRTF example code it shows manually moving a sound around. I’m not sure if you can provide a rotation matrix as described above to apply to the frame of reference which would manipulate the internal Ambisonic implementation rotating the entire audio scene. It would be neat if that is possible with OpenAL Soft.

Ahh… Was just a rhetorical question as I didn’t understand where you were coming from with the multi-thread / process angle. ;D

Original poster! Have you made any progress?

DarkCart · January 22, 2016, 2:00pm

I saw something related to this on Reddit a few weeks ago. http://www.devdungeon.com/content/binaural-beats-java.

Catharsis · January 23, 2016, 9:48am

A slight further derailment ahead… ;D In this case ‘binaural’ as with two ears is the only connection per se and not connected with binaural spatialization. But this does stoke another passion of mine which is tuning systems. Beating insofar as acoustics is concerned is something we hear daily with Western temperament (12 TET). I’m a big fan of just intonation and microtonal music in general. A convenient example at this time of the beating that we are faced with daily is this Youtube clip w/ nice visual display… :

philfrei · April 12, 2016, 4:06am

This thread has my first hack at 3D sound, using pure Java. Since the original question was about whether this is possible or not, it seemed useful to have a link here.