Comparing binaural methods

Not too long ago, we were having a discussion about binaural sound effects, and Riven wrote about the technique of introducing a time difference between the left and right tracks to create a pan effect.

I wanted to investigate this further, so I wrote a program to compare two ways of creating stereo:

  1. by having a mono signal be given different volumes in each ear
  2. by having a mono signal be given with a time difference in each ear

Jar file, with source code is here

I put in a couple different source sounds:

  1. sine wave, with ability to specify the Hertz and the length of time in seconds (can test specific ranges–supposedly the time effect works best with low notes and the volume effect works best with high notes)
  2. sine sweep, from low to high (about 4 seconds long)
  3. frog sample (recorded from a local creek, saved as a mono array of PCM floats in the range -1 to 1)

There are two sliders:

  1. volume panning, uses the following formula to convert the slider value (“volumePan”) to channel factors:

		float pan = (volumePan + 1) / 2;
		
 		for (int i = 0; i < size; i++)
		{
			stereoPcmSample[i * 2] = pcmSample[i] * (1 - pan);
			stereoPcmSample[i * 2 + 1] = pcmSample[i] * pan;
		}

  1. sound frame differences, which can range from -128 to 128 (if negative, left channel leads by given number of frames, if positive, right channel leads by given number of frames).

For reference, a sound frame here is 1/44100th of a second, or approximately 0.000_023 seconds in length.

Another reference, the distance sound can typically travel in air is 343.59 meters per second. So, in one frame, we have 343.59/44100 equal to approximately 0.78 centimeters.

Thus, the distance from one ear to another should consume, what? (jokes about fat heads not needed)
I get about 30 cm if I roll a ruler around my head. What does that come to? 36 frames for the biggest time difference between the two ears?

In fact, when I was listening to this with headphones, I was putting the left-most or right-most setting of the framediff slider at between 32 to 36 (or -32 to -36). So, I am pleased this calculation seems to come out. Making the difference larger than that didn’t seem to increase the location towards the sides.

I found the “frog croak” to be the clearest example. With the sweeping tone, it seems like the location may move around a bit as a function of the pitch with the time-difference method. Not sure how to account for this.

Is it my imagination or wishful thinking that putting the frog at a location to the side, and matching it as best as possible with the volume-pan, that the time-difference method sounds a bit cleaner somehow?

Feedback much desired. I made the code available in the jar so folks could verify what is being presented. As always, as a musician who taught himself Java, any suggestions on the coding itself are also appreciated (either on this thread or as personal messages).

I’d say this is just amplitude panning; not actually binaural yet. You’re more likely to end up with something that sounds like a ping-pong or tremelo effect if you alternated things quick enough with what you are doing. Perhaps try a square or saw wave if you want things to stand out a bit more.

Actual binaural audio refers to applying a HRTF (head related transfer function) to filter audio frequencies which alters more than just the onset of a sound event between each ear. From there you can also apply the inverse square law for simulated distance.

I’d say to continue your investigation look into SuperCollider ugen / DSP code:

general links:


https://ccrma.stanford.edu/workshops/gaffta2010/spatialsound/

Looks like the last one might have a bunch of video from the workshop available.

Source code for binaural / hrtf loading which uses the Convolution2 ugen of SC:

Convolution ugens from SC:

Stepping through both the Binaural.sc for loading the HRTF and then stepping through Convolution.cpp will get you on the right track to implementing something yourself.

[quote]I’d say this is just amplitude panning; not actually binaural yet. You’re more likely to end up with something that sounds like a ping-pong or tremelo effect if you alternated things quick enough with what you are doing. Perhaps try a square or saw wave if you want things to stand out a bit more.

Actual binaural audio refers to applying a HRTF (head related transfer function) to filter audio frequencies which alters more than just the onset of a sound event between each ear. From there you can also apply the inverse square law for simulated distance.
[/quote]
@Catharsis
Thanks for the links in the second post. I will look into this further. I can see the point that “binaural” might more precisely refer to a combination of timing and filtering, to take into account the audio shadow created by the head itself as well as the timing differences. In any event, I just specifically wanted to hear for myself if adding a timing offset would also contribute to the panning effect, and what it would sound like.

The demo I wrote doesn’t do anything fancy with the amplitude/volume panning. The math is very simplistic. To make this into a 3D system, yes, the proper scalings would have to be employed, as well as using trig for the corresponding angles, and some sort of low pass filter for the unequal timbral degradation of sound over distance.

But the second slider, and “frame diff” play button do nothing except change the onset times. Whatever it is, it is not amplitude panning. The amplitudes are identical. You can see this by dumping the DSP data (right button). The left and right data streams are identical, just offset by frame numbers.

I’m not sure I understand your point about ping-ponging. That won’t occur until the two sounds no longer fuse perceptually. At 30 frames or under, the audio event seems to me to be heard as a single event, still. If you go over that, then some weirdness would start to happen (such as ping pong). The scale of the second slider is far too large for the range where the effect is most relevant. But again, I was doing this to hear for myself, not rely on what theory says one should hear. and so wanted to test larger ranges as well and hear how it degrades or not.

Just posted a redo of the above jar. Here it is again. I meant to do more with it, but only got the following done: a bit more info about the “distance” and time implied by the difference in frames for the second panning method. Also, got a standard sort of calculation relating values to db’s. But the pan that uses volumes could still be improved. In particular, the energy at the ends is greater than the perceived volume in the middle. Something like a sine function that has the middle be 0.7 + 0.7 (left and right) instead of 0.5 + 0.5, it turns out, sounds better.