Vari-Speed Looper

philfrei · June 27, 2011, 9:26am

I am posting a Variable-Speed-Loop Playback program I wrote this week.
http://www.hexara.com/VariSpeedLooper.jar Removed.

[EDIT: Now runnable as an Applet! http://www.hexara.com/VSL/VSL2.htm
There’s a big diagnostic field in pink that can be ignored, and the control pad is a bit deformed from what it was in the jarred version, but I’m just happy to have worked through BOTH learning JNLP api and dealing with InputStreams that have to be converted to ByteArrayOutputStreams to ByteArrayInputStreams in order to get AudioFileFormat, blah, blah. Unclear if this code works on a Mac. One friend with a Mac reports non-looping behavior. Now he says it is working fine.]

You can load a .wav file (only format implemented: 16-bit, 44100fps, stereo, little endian). (It’s not too hard to convert .wavs in Audacity.)

This was inspired by hearing the clicks in the sound for the demo movie of “Opposite Lock Racing”. I’m thinking: put in a motor sound and do dopplers and volume changes based upon the speed of the car. The goal was to show that one CAN do volume and doppler changes to a sound file without clicks. I think both (in this app) are fairly smooth despite some clunkiness in the GUI. [EDIT: turns out the clicks in “Opposite Lock Racing” were due to another cause!]

I’ve set it up so that the GUI only interacts with the coded VariSpeedPlayer object which in turn reads data from a VariSpeedTargetDataLine object. When the wav is loaded, it is encoded into two float arrays in the VariSpeedTDL. Playback is done via linear interpolation into this array, with the read data sent to a SourceDataLine for playback. The volume changes are handled via a volume controller built in change incrementally with every frame.

[EDIT: following was solved by using a double to track my position “in between” the frames rather than a float.]
[s]A couple things are a bit disconcerting, and I don’t know whether they are expected artifacts or the result of buggy coding. (1) the pitch wobbles back and forth between two frequencies every few seconds. The degree this occurs varies in a way I can’t predict yet. It is subtle though. Best heard with a sound that has a continuous pitch. (2) There is a bit of fuzziness or distortion when the overlap occurs. I put in some IF statements to check for aliasing, but that doesn’t seem to be it. It is pretty soft.

Any suggestions or explanations about these two artifacts?[/s]

krasse · June 27, 2011, 11:46am

If you want to get a good insight into sound programming, you can look at the source code for Gervill.
It contains resampling code of different quality (which can be used for modifying pitch) and a lot of other useful stuff.

philfrei · June 27, 2011, 7:08pm

Interesting and cool resource! At first review, they are emphasizing the javax.sound.midi (I confess to being an anti-midi snob) for the most part, with the exception of the additive synthesizer that creates sounds from scratch. Nice! The chorus implementation might be worth looking into, if they are programming it themselves and not using a prebuilt unit.

I think, IF you want to do a Doppler control on a sound effect of an engine, for example, or build a background ambience (soundscape: think frogs & crickets, or city noises, or war environment) that is not just a prerecording but is responsive to game state (this is my goal), there might not be a lot here that is directly usable for game programmers.

I probably need to get the Vari-speed tool up as an Applet in order to encourage folks to try it out…[EDIT: the Applet displays, but I need to get the code working to allow file reading. Sigh–haven’t done this before. Have to learn about signing and stuff. Coming soon, I hope.]

@krasse I’ll keep looking for the resampling code you mentioned. [EDIT: Thanks!]

krasse · June 27, 2011, 9:23pm

The resamplers can be found in the following classes:

SoftLanczosResampler
SoftCubicResampler
SoftSincResampler
… all classes that extends SoftAbstractResampler

I think that Gervill is quite useful as an SFX player. Doppler effects can be achieved with key-based tuning messages etc.

Gervill is also supposed to replace the Midi player in the java distribution (or is it perhaps only for OpenJDK?).

A class that you absolutely want to take a look at is the EmergencySoundbank. It creates a lot of instrument samples procedurally
Also, if you want code for handling a lot of different sound formats, the Gervill source provides this as well!

nsigma · June 28, 2011, 7:01am

Could you post some or all of the code you’re using? I’ve done a fair bit of Java audio coding - I’ve never had a symptom like (1) doing linear interpolation. It can be a bit rough and ready, but the effect tends to be consistent.

You might find the same / similar code in RasmusDSP under BSD license as well - not sure how much overlap there is. RasmusDSP was written by the same guy prior to him writing Gervill for OpenJDK. RasmusDSP is a much bigger project mind you, and seems to now be abandoned - it’s a full javax.scripting language for audio DSP.

Both Gervill and RasmusDSP are in Frinika too - that’s worth looking at for what you can do with Java audio!

I personally ended up porting some C code from Pure Data (tabread4 component) for interpolation - bit of a simpler API. You can find it in the getSample() method (last but one) here http://code.google.com/p/praxis/source/browse/rapl/src/net/neilcsmith/rapl/components/SamplePlayer.java (ignore all the commented out crap in there - it’s a live sampling component so I was experimenting with a variety of approaches for smoothing discontinuities).

The chorus is all Java / Gervill code, as is the reverb, filters, etc. I’ve been working on porting various bits of Java DSP stuff to this common interface http://code.google.com/p/praxis/source/browse/audio.ops/src/net/neilcsmith/audioops/AudioOp.java - probably about as simple as it can get. I’ve done the Gervill filters, but not the chorus unit yet. The idea is to make these available as standalone JARs - GPL with and without CPE depending on where the code is from. (They’re available as separate JARs in the Praxis download already but that’s a bit big just to get these(!), and they need JavaDoc’ing too).

It’s high time there was some sort of simple API for Java DSP (simple LADSPA / VST like standard would be great). Whether that interface is the answer I don’t know, but there’s some great code out there but it’s all a bit too shackled to the project it’s part of.

Hope all that’s helpful. Best wishes, Neil

philfrei · June 28, 2011, 10:29pm

This is where I do the linear interpolation. stereoL & stereoR are two float arrays, one value per frame, normalized to between -1 and 1. When the arrays are built, I assemble the wav bytes and divide by 32768. I’ve been messing around a bit trying things like using a double instead of a float for the fractional part. I confess to not fully thinking through the implications of float/double interactions. (There’s some slightly more involved code for when we are in the “overlap” section, that accomplishes the “A B roll” cross-fade. But the pitch change artifact can be heard when only the below is called.)

	public float[] get(float[EDIT to double] moment)
	{
		// moment is fractional location
		// use linear interpolation
		int fr1 = (int) moment;
		double frac = moment - fr1;
		
		stereoVals[0] = (float)(( stereoL[fr1] * (1f - frac) ) 
			+ ( stereoL[fr1+1] * frac )); 
		stereoVals[1] = (float)(( stereoR[fr1] * (1f - frac) ) 
		          + ( stereoR[fr1+1] * frac )); 

		return stereoVals;
	}

I haven’t tried making the index that tracks the “moment” a double. I will try that. :point: It worked! I had this notion that “floats” are fine for audio data, but that doesn’t mean that some of the calculations won’t benefit from use of doubles.

Hmm. Now I am wondering, if there is some loss when “normalizing,” as floats have a limited number of decimal points. The data requires the equivalent of 15 bits of encoding (the 16th being used for + or -, I am assuming). Is there a plain English statement somewhere about how precise a floating point decimal value can be expected to be? (http://java.sun.com/docs/books/jls/third_edition/html/typesValues.html#4.2.3) I can’t figure out what this article is saying. Is the decimal point at a fixed bit in the 32-bit float encoding? (e.g., 16 bits on either side? or 24, 8 distribution?) It doesn’t sound to me like there is any degradation comparing origiinal wav playback to the app playback.

nsigma · June 29, 2011, 9:59am

Well, if it works, it works!

I’ve just tried the link you posted earlier (nice interface btw) and I don’t get the effect you mention, so I assume that’s the fixed one?!

Assuming you’ve got the variables position (“moment?”) and speed and are doing the equivalent of position+=speed every sample then you shouldn’t get the effect you mention using floats. Or are you somehow feeding back the value of position to calculate the speed every block you process? Using doubles tends to be useful when you have values feeding back into each other, for both audio and control values. eg. a lot of software uses float buffers but will use doubles internally in filters where the values are constantly multiplied against each other.

What exactly do you mean by normalizing? Usually this means increasing the amplitude to the maximum possible. Or do you mean the conversion back to bytes? Either way you shouldn’t lose anything - the float representation has far more information than the bytes do. You might want to normalize (in the usual meaning) the file when loaded, but you definitely don’t want to be doing this to every output buffer.

The other thing I’d suggest (if you want a smoother effect) is to linear interpolate your speed and volume values every sample, or reduce the buffersize you’re processing at a time down to a maximum of maybe 1024 samples, ideally less.

philfrei · June 29, 2011, 8:40pm

@nsigma Thanks for giving it a look and the thoughtful replies! Glad you liked the GUI. I was just conceiving of it as a throwaway “demo”–the designed usage of the underlying VariSpeedPlayer in a game would have controls with finer granularity than the MouseMotionListener, which I find makes things jump around a bit much. (But I WAS pleased to finally figure out how to map an exponential function to the x-location.)

I am using the “position += speed” algorithm. The improvement in sound quality came when the “position” variable, the argument called “moment” in the get() method, was changed to a double rather than a float. I am going to speculate that the significant digits of a float aren’t quite adequate to maintain what we perceive as a steady pitch with the interpolation method used, that rounding in either direction can be heard as two slightly different pitches. But doubles are fine–the difference in pitch due to roundoff is less than what we can normally discriminate.

I was borrowing the “normalizing” term from vectors. I guess that is not standard usage. I was referring to converting the wav value (signed short) to a signed float between -1 and 1. This happens when the wav file is loaded. And, yes, the process is updating the volume and “speed” every frame (every sample) and the buffer size does not come into play. Any jumps heard are more due to the lack of response in the gui than the VariSpeedPlayer object. In fact I deliberately put in a limit as to how much the volume can change in a single frame in order to smooth out volume discontinuities.

Still looking at how to make this into an Applet that can read a file from the client. Found some user-made JNLP tutorials that might help. I found the Java Tutorials and Core Java (Horstmann) a bit sketchy.

nsigma · June 30, 2011, 12:42pm

No problem. It’s the one topic I know vaguely more on, as opposed to most topics around here that I know a hell of a lot less!

As for the GUI - I’m just a sucker for an XY pad. At least you bothered and didn’t resort to sliders!

Without hearing what you heard it’s hard, but I wouldn’t have expected to hear the effect you describe - a noisier signal maybe, but not one where there is audible changes in pitch. I definitely can’t reproduce in a number of things I know use that strategy with a float. Maybe my ears are wobbly. ;D

Pretty close to the same really, though you’d be dividing by the maximum level in the file rather than always by 32768. Incidentally, every audio library I know divides by 32767 (though you’d expect this might result in a value outside of -1:1).

Would be good - I generally wouldn’t have tried a JAR posted on a forum, but as I’m doing a system update anyway today …

philfrei · June 30, 2011, 11:44pm

Interesting. That really is kind of bizarre, and it puts one on the spot. Shall we follow the crowd (they must know something) or leave it 32768? The only wav value that might cause an aliasing event is -32768. But the way I am doing it, I have to be careful not to arrive at 1.0 for a float-audio value and overflow when converting back.

Neat idea about normalization. One should be able to scan the entire file for max and min values, and use that before scaling to the “normalized” format. I will keep that in mind. Very nice to have your feedback @nsigma and willingness to look at the jar! I hope I can return the favor sometime.

It’s now up as an Applet! See the first post for the link. Two days of headaches getting this to work as an Applet. Had to learn JNLP and figure out how to take an InputStream of WAV data and make it into something that can be “Marked”. Some of the diagnostics I wrote (big pink field) are still on the GUI. Will probably get rid of them later.

namrog84 · July 1, 2011, 12:06am

I know this makes me sound like a lazy slacker, but I think it would be awesome of you to include a download of at least 1 small .wav somewhere on the site, incase people don’t have a .wav laying around in the right format. Also not remembering where the windows sounds are, or if they meet the criteria for the proper wav

philfrei · July 1, 2011, 2:52am

@namrog84 - I feel like I’m the lazy slacker, here. You are not the first to suggest this. Another friend tried out the program but only had a 10-minute long WAV file on hand, so he never got to the looping part.

When I get a chance, I will try to do it right. For example, will include sounds that were actually designed for continuous sf/x usage. AND, will fix some of the things I left undone. Eventually, the GUI might evolve into a tool for helping design sound for games.

Current priority: getting an Ogg to Wav converter working. Purpose: download resources as Ogg, but playback as Wav in the game. I’m doing this based on an idea from the exceedingly awesome princec.

Meanwhile, a friend just discovered THIS if you want to have some fun:
http://audiencesounds.com/

nsigma · July 1, 2011, 9:10am

Actually, giving it some more thought I think I know the reason. In audio you usually want symmetry around the zero line, which obviously floats between -1 and 1 give you. From a bit of Googling it seems like a lot of software will export by multiplying by 32767 (so -32768 should probably never occur anyway). This makes some sense as if you multiplied a full power positive signal (ie. 1 * 32768) you would get overflow (as I just noticed, you wrote).

Incidentally, while Googling this I came across this quite interesting page http://wiki.audacityteam.org/wiki/Dither, which covers a slightly tangential issue to converting back. The only Java library I’ve seen bother to do this is Tritonus (I think).

Classic! Bit OTT for the theatre, but uncannily reminiscent of watching our houses of parliament. ;D