3D audio test using JavaFX and procedural java sound

philfrei · March 16, 2016, 2:26am

File is here.

Am in a bit of a rush–so quickly:

I am working on a 3D audio system.
I made a quickie 3D chess board with JavaFX. You can tool around using the arrow keys.
The purple towers that come and go each emit a sound generated via a procedural FM synthesizer. There are six towers and five have pretty much the same frequency (slight variations in the amounts and rates of lfo and other tweaks) and one is considerably deeper in pitch.

The main thing: I’m using TIME to create a pan effect.

put the audio output of each tower onto a delay line
calculate the distance of each tower to the “ears” of the camera (the ears are separated by 32 pixels)
use the difference in distance to control two variable pointers into the delay line, one for each ear.

There is NO volume panning being used. Both ears are getting the exact same data, just with a time offset.

I had a pretty good volume pan going, but here’s the problem with that: if you get up close to a tower and make one ear next to it, and the other away, the tower is pretty much silent in the far ear, allowing one to hear other stuff on the board.

In reality, if you are standing next to a jackhammer, the volume of the jackhammer masks both ears. It doesn’t create an audio shadow that allows the sound to disappear in the far ear. With the temporal pan, the masking is much more correct.

I haven’t yet tried putting in any filters to replicate front-back cone-of-confusion disambiguation. Nor even tried a slight front-back volume difference, which might be helpful. I do plan to get to this.

There is a bunch to do, in that first it has to work, then you improve performance and design. I seem to have to do things wrong a few ways before I can figure out a more reasonable way to do things.

Patting self on back: the changes in pan and volume are smooth, no zippering. I might try making my camera ears the equivalent of 35 centimeters apart instead of 32. There are undoubtedly other things to fix, too.

Catharsis · March 16, 2016, 3:58am

I checked things out.

Nice little test setup; you shouldn’t need to do much more for a basic environment to navigate around for now.

I’d recommend to changing to voice samples as speech will provide considerably better testing when generators are close together.

There is a bunch to do, in that first it has to work, then you improve performance and design. I seem to have to do things wrong a few ways before I can figure out a more reasonable way to do things.

I still recommend prototyping Ambisonics directly if you can get 4 or more speakers together and / or also decode to binaural for headphones with SuperCollider with the same JavaFX setup and work backwards instead. That will provide a “gold standard” of accuracy to compare any implementation you come up with separately. Of course there is also the much simpler steps of working backward with all the DSP code on hand.

If I had any free time currently I’d set this up for you, but alas can’t help out currently.

The time you’d spend to get SuperCollider up and running and working within your test framework so you can compare results against will pay off big time.

philfrei · March 16, 2016, 8:53am

@Catharsis Thanks for your thoughts.

How would SuperCollider integrate with Java or JavaFX? My impression is that it is very powerful, and large. The latter fact is the main reason I’ve been avoiding it. I’m not as concerned with achieving that level of power. I’m mostly interested in bringing a useful subset of procedural functionality to Java game programming.

It would be good to tool around and compare, though. Having an actual 3D environment really makes a big difference. For example, when I was using volume panning, it was really easy to hear how using 0…1 pan setting and Left = n, Right = 1-n is very uneven. Spinning around in place next to a sound source, it is quite clear that left = 1, right = 0 or left =0, right = 1 is considerably louder than left = 0.5 and right = 0.5. I found that an equation that beefs up the center gives a much more even sense of volume while spinning, e.g., left = sin(n * PI/2), right = sin((1-n) * PI/2). In this case the 50% point gives (approximately) left = 0.7, right = 0.7, which is a better match for the all-at-one-side cases.

Voices would be interesting to hear.

I want to learn a bit more about filtering. The setup I’m using with a dedicated delay pipeline should work well with a low pass filter. I think a common simple design is to have a bell-shaped distribution of the sound value, e.g., 10%, 20%, 40%, 20%, 10% over five adjacent sound frames. But I’m not clear on how to derive the particular frequency effect of a given filter from the distributions. Presumably, as the percentage gets higher in the middle, there is less rolloff. But beyond that I’m still a novice.

If I manage to get code in place to play around with filters, I may only take things to the point of getting one or two decent occlusion algos and something for front/back. My sense for the front/back stuff is that the frequency area involved is principally the “presence” area (around 5K Hz) and maybe also a bit of the sibilants. That is the sense I’ve gotten from some car listening this most recent early Sunday morning (few cars going by so can listen carefully, comparing from behind and in front).

boxsmith · March 16, 2016, 10:52pm

This is really cool. I experimented with the same thing just over a year ago, using a max/msp patch with my game. The results were promising, and it’s since been my plan to revisit this approach more seriously, but my inexperience with sound programming makes the prospect seem kind of intimidating.

Is this meant as a new feature for PFAudio? Do you see that becoming a more general-purpose game audio library?

philfrei · March 17, 2016, 2:35am

@boxsmith – Thanks!
I don’t know much about how to patch things in. Is patching in max/msp similar to how one might patch in SuperCollider? How are games packaged in that scenario? Aren’t those code bases rather large and come with restrictions?

Yes, this is meant to be part of PFAudio.

I am kind of overwhelmed with the task of adding the features and capabilities that push this library towards my vision of being a choice for procedural audio. I’d like to have a free version available. I have some objections/difficulties to spending the time to making it wholly general purpose. There is a lot to implement and support. For that sort of thing, I strongly recommend TinySound, which really is good for standard sound effects and music processing: cues can be loaded and played, mp3, ogg, etc. are accommodated, in addition to wav files.

I don’t want to get into any backwards compatibility issues while I am still designing things. Things keep being radically turned around and experimented with. However, I’m open to the possibility of helping out with a custom version for a specific game, on an individual basis. Message me if you wish to discuss this some more! I am very sympathetic to DIY development.

boxsmith · March 17, 2016, 6:03am

I don’t know much about SuperCollider, but maybe there’s some confusion in terms here. A “patch” is a Max program. The game and the patch ran as separate programs, with the game sending OSC messages to the patch (to update listener position/orientation, sound source parameters, etc.). This worked ok, and it did nearly everything I wanted it to, but it was only good as a prototype. Really, I just wanted to know if I could do better than volume panning, and Max was a good way to experiment with that and other ideas.

As far as I know, there’s no way to package a game with a Max patch without it turning into an embarrassing mess of duct tape, and anyway, I can’t see why any sane human would want to. Max is fun and easy, but Java->OSC->Max isn’t exactly a high-performance solution. (Incidentally, a quick google search suggests that OSC might be the way to go for SuperCollider, too.) If I were to try this again “for real,” with the expectation of shipping a game with it, I definitely wouldn’t reach for Max. Whatever legal restrictions there might be, they don’t seem terribly relevant in light of the practical ones.

Thank you for your offer. When I’m further along my roadmap, I might take you up on it.

nsigma · March 17, 2016, 9:17am

I’m assuming Max doesn’t have an equivalent to libPD then? That might be an alternative to look at based around the open-source “equivalent”, Pure Data.

There’s no reason in principle the Java->OSC->[something] approach couldn’t be high-performance. OSC was originally designed for real-time audio control. Lots of systems use it to communicate between audio process and front-end (incl. Supercollider as you said). In fact, I’m using a Java->OSC->Java mechanism to offload Java audio into a separate process and this performs better than using a single VM because the audio process can be tailored to have low memory heap, minimal garbage collection, etc.

Catharsis · March 17, 2016, 6:55pm

Indeed as @nsigma mentions… I was going to bring up Pure Data / pd as an option. OSC is as fast as anything you can send over the wire. Even 13 years ago when I got things first working via NIO it was blazing fast. Simply create a UDP packet / OSC packet, index it, and hold on to it overwriting just the necessary data sections. In the case of a slider / single parameter that is just a single float then punt it off repeatedly and well, super fast… I’ll see if I can clean up that old code and release it along with an example of getting SC up and running.

As boxsmith mentions whether it’s Max / pd / SC you effectively get a patch loaded in any of these environments and at that point in the case of your demo about all you have to do is let the audio engine know the volume coefficients (inverse square law basically) calculated from world coordinates as they change / user moves and a 2D / 3D transform matrix of orientation of the user / listener which in the case of ambisonics is simply multiplied against the encoded entire mixed scene of audio and auto-magically everything rotates perfectly.

In short the biggest concept that SuperCollider 3 introduced is that the audio / DSP engine is a standalone network based server. It only is programmed / configured by OSC. Any language can interface with it. You do have a concept of a patch called a “synthdef” in SuperCollider. It’s like the audio equivalent of a SPIR-V encoded shader for Vulkan.

Looks like a nice video tutorial / #3 covers synthdefs:

Remember though in any tutorial they aren’t necessarily going to point out that SC3 server can be used headless via any language and purely configured by OSC.

@philfrei Creating a procedural audio engine let alone really well working spatial library is very difficult. Not saying that you don’t have the passion for it as from everything I’ve seen / read you do. If anything use pd / SuperCollider as a strong reference point especially since they are open source. Once the light bulb clicks on getting things to work from Java it will be quite something. If anything as a rapid prototype environment… then take what you can learn about audio DSP back and create your own library for a particular purpose, but a whole procedural engine would no matter what be quite the task…

It’s kind of hilarious as if you look back to the first message posted on JGO all those years ago for me it was SC related…

As far as making money with an audio engine I’m afraid that gig has been up along for the most part any developers tools. That market finally crashed with Unreal / CryEngine / newcomer Lumberyard (CryEngine reskinned). The trick at this point seems to be to put out enough open source to entice and hope that catches the zeitgeist of people potentially flooding in then create value with pay components or other associated cloud based services that can sit behind a pay wall… As you can see though that still relies of luck. It’s no way to pay SF Bay Area rent I’m afraid or that’s a nut I could never crack… So contracting in general remains the best way to continue working on the really cool stuff.

philfrei · March 17, 2016, 10:30pm

@Catharsis – Thanks again for lots of good info and feedback!

OSC is something I plan to look into in more depth. It ranks high on the to-study queue. All I know about it is that as a spec, it allows much more to be controlled than MIDI does. I’m thinking, in particular, once the updated theremin is built, I might draw from OSC methodology to control various expressive parameters of the theremin synth. Or, I might stick with something more like defining a control line as MIDI does and hook into the synth in this fashion. To be determined. One consideration: making synths that implement OSC or MIDI with any pretense to completeness is out of scope.

The notion of drawing ideas and algorithms from open specs of PD and SuperCollider is spot on. @nsigma has also provided links to implementations that are very helpful. I’m thinking of the source code, in particular, of a reverb unit that I was recently looking at.

Inverse square law for relating loudness to distance! Doh! Of course. I tried out several different power equations and settled on Math.pow(x, 6) as sounding the best, where x is a normalized fraction of a given audible distance range. I will try inverse square and compare. Theoretically, inverse square makes more sense. Ears and art don’t always agree with physics, so I like to verify this sort of thing. (Who should I believe, experienced audio engineers or my own lying ears?)

For the 3D movement (not counting Y-axis as of yet), I’m relying on functionality built into JavaFX. I don’t fully follow your example. I have what I’m calling an EarSet object. It’s job is solely to manage the location of two “ears”. To update it, I give it the X, Z and camera angle from the JavaFX PerspectiveCamera, once per game loop. To get pan and loudness, I use the EarSet’s left and right ear coordinates at one end and the coordinates of the sounding object. No matrices involved, beyond what JavaFX manages behind the scenes via the PerspectiveCamera. This seems to be working pretty well.

Today, I put time into a Theremin front end rewrite, as I am hot on the trail with new things to implement based on how well the volume and pan smoothing worked on this demo. Maybe in a few days I’ll get to the PD, SuperCollider or OSC research. But it seems like a good thing to build stuff and learn from the experience. No strategy is always right.

Money, money, money. Whatever. I fully understand the thinking of independents that don’t want to spend anything. We build our own games, and already expect them to cost more than any income they might ever generate. I have been doing this for years. I understand dreaming. I call think of dreams as “Black Swan” events after the book I read on the subject. I am musing with the idea of sharing in the risk rather than asking for money up front. IF a game is a “Black Swan” success, the game maker can afford to pay something reasonable. Otherwise, no charge. It is a crazy business model, perhaps, trying to maximize the chance of participating in a black swan rather than up front income. But this is all very nebulous, and entirely moot, until the library shows it adds value, and that is still a ways into the future.

Catharsis · March 18, 2016, 1:15am

Indeed a transport independent messaging protocol. Keep in mind that SuperCollider doesn’t exactly follow the OSC spec exactly, but is essentially similar. You could send MIDI over OSC for instance.

Essentially you’d want your synth to know nothing about OSC or MIDI. You’d marshal data from OSC or MIDI data to whatever internal representation is understood by the synth. Ideally this internal implementation is event driven.

In this case for a spherical wavefronts physics matches perception. IE a sound source with no walls / reflections adding to the direct sound. You may potentially futz with the coefficients to change things somewhat, but still follow an inverse square relationship.

Matrices are involved under everything in JavaFX. Each Node of which PerspectiveCamera is inherited from has a transform. The rotation of the PerspectiveCamera is all you need for ambisonics.

Having not used JavaFX you’d likely call:
https://docs.oracle.com/javase/8/javafx/api/javafx/scene/Node.html#getLocalToSceneTransform--

then:
https://docs.oracle.com/javase/8/javafx/api/javafx/scene/transform/Transform.html#toArray-javafx.scene.transform.MatrixType-double:A-

And that is what you’d send to SuperCollider or what have you to manipulate ambisonic rotation.

Like anything it may take some time to explore. It definitely doesn’t hurt to know what has come before and depending on goals one might find that the existing solution is solid enough. The task then becomes filling in the gaps. Way back when I thought I’d have to spend a bunch of time creating a DSP engine then SC3 dropped and the split audio DSP server architecture fit my needs.

Developers are fickle even considerably more fickle than consumers in many respects. You’ll find plenty of developers independent or otherwise that will not be willing to pay for X developer tool. Rather than single tools developers will pay for platforms / ecosystems. The trick then becomes providing more value than one captures for creating some symbiosis that brings in more developers while extracting enough funds, preferably in an ongoing basis (the capture angle), to make it all viable. And at that as mentioned a free platform with some services that can be kept behind a paywall fits that pattern. IE Lumberyard w/ integration with AWS and other for pay services is a big example; tools for free, but hey isn’t it sure easy to integrate with our other for pay products. As things go an interactive audio engine for games is not an easy platform to deliver as a service.

At the end of the day rent has to be paid, food needs to get on the table, and health of all involved needs to be maintained.

IMHO that relies on expecting the other party is honest and will keep things on the up and up. Collecting anything even from a moderate success will rely on the ethics of the other party which is a risk. If it was a truly black swan event it’d be easy for the other party to simply not pay and make it a legal situation. Anything over values that could be collected in small claims court could get locked up in a costly legal battle.

I’m not saying don’t take this approach just consider that other parties may not play by any agreed upon rules as sad as that may be.

philfrei · March 18, 2016, 5:12pm

@Catharsis – More good and interesting thoughts, thank you.

[quote]Collecting anything even from a moderate success will rely on the ethics of the other party which is a risk. If it was a truly black swan event it’d be easy for the other party to simply not pay and make it a legal situation. Anything over values that could be collected in small claims court could get locked up in a costly legal battle.
[/quote]
I think at the end of the day, working with people, doing business, requires trust, and written documentation of any agreements made, if only as a way to help make all assumptions explicit and thus eliminate misunderstanding.

My knowledge of people who have made it big (e.g., Notch, or that fellow that made the game where the little bird gets pounded by pistons that went viral) is that they have plenty of integrity. I’m sure there are counter examples.

[quote]And at that as mentioned a free platform with some services that can be kept behind a paywall fits that pattern.
[/quote]
Kind of what I’m thinking. There can be “free” versions (various jars with api’s) that provide different services. But the biggie isn’t trying to make something general purpose, but to give me the tools to make the best, most dynamic game audio I can, or to have lots to bring to the party upon collaborating with other musicians/programmers towards making awesome and unique games.

[quote]In this case for a spherical wavefronts physics matches perception. IE a sound source with no walls / reflections adding to the direct sound. You may potentially futz with the coefficients to change things somewhat, but still follow an inverse square relationship.
[/quote]
I started experimenting with this. My initial (current jar) algo was to define a max audio distance and get the % of this (expressed as a normal N), and apply it to a N^6 mapping (via LUT). This seemed to create an acceptable drop-off.

I tried a couple more things last night: scaling the distance into “attenuation units” and then deriving the volume factor by putting this value in the inverse form: 1/N. Also tried 1/(N*N). I can’t say that it is all that clear to me, from a listener’s perspective that one is superior to the other, yet.

More futzing and tweaking required. Come to think of it, I may have forgotten to compensate for the N^6 mapping done at the final stage. Hmmm. Have to work today–won’t get a chance to think more on this for a while, except in the cracks/breaks.

Catharsis · April 12, 2016, 6:10pm

I started experimenting with this. My initial (current jar) algo was to define a max audio distance and get the % of this (expressed as a normal N), and apply it to a N^6 mapping (via LUT). This seemed to create an acceptable drop-off.

I tried a couple more things last night: scaling the distance into “attenuation units” and then deriving the volume factor by putting this value in the inverse form: 1/N. Also tried 1/(N*N). I can’t say that it is all that clear to me, from a listener’s perspective that one is superior to the other, yet.

More futzing and tweaking required. Come to think of it, I may have forgotten to compensate for the N^6 mapping done at the final stage. Hmmm. Have to work today–won’t get a chance to think more on this for a while, except in the cracks/breaks.
[/quote]
I kind of meant to continue the discussion before regarding I suppose a final fun detail to try when dealing with proper scaling / attenuation. It’d be interesting to see a plot of inverse square versus your other efforts. It could be similar. As you know in games often it’s a game of approximation for audio or graphics effects being presented.

One kind of fun thing to do is apply a bias parameter to whatever formula one uses for attenuation. This is more or less the story of the motorcycle and the fly. For ease and clarity of sound generation your samples or procedural audio should be loud with perhaps a few dB below max for each source. The way you handle loud sounds like a motorcycle is have a wide bias that extends the center of the attenuation curve. So your distance (attenuation units / whatever) is much wider for the center of the motorcycle say 30’ / AU before the attenuation curve starts to kick in and extends much further from the source. For a buzzing fly though one sets a really small bias of perhaps 2" center or whatever “AU” that is and by 2’ away the sound drops off entirely.

I’d have to go search around to find the inverse square attenuation + bias formula I use in my efforts, but that is how you handle really loud objects and really soft ones with a natural attenuation curve if one uses the inverse square law.

This biased attenuation curve also is a good first pass for culling sound sources from the larger scene dynamically. If you can’t hear it don’t render it!