Simple Line Mixer

paulscode · August 31, 2011, 9:54am

– This slot is reserved for the most recent working releases –

paulscode · August 31, 2011, 9:55am

Project Overview

This project will be to develop an easy-to-use interface for mixing audio data for playback through a single output data line using Java Sound. This is essential for systems where the Java Sound Audio Engine or alternative software mixer is not available, where the existing hardware mixers only support a single output line, and where a pure-java option is needed. It will be for 2D audio only, and will not support panning or pitch changes. It will implement a per-line gain changing capability if that doesn’t introduce too much latency.

gouessej · August 31, 2011, 10:55am

Hi

Sorry, English is not my mother tongue. No pitch, no pan, does it mean that it will be impossible to modify the loudness of a sound and that the sound won’t be “spatial”?

paulscode · August 31, 2011, 11:09am

Pitch is another word for frequency (a base drum is low-pitch, and a bird chirping is high-pitch). Pitch changes are typically used in 3D (spatial) sound to simulate changes in velocity and Doppler Effect.

Pan refers to the volume difference between the left and right speakers. Pan changes are typically used in 3D sound to simulate a sound source coming from a particular position in 3D space.

Gain refers to the overall volume (both speakers).

This project will not have pitch or pan capability (because adding the ability to change those introduce significant latency). It will be designed for use in applications that do not require 3D audio (for folks who just want an easy-to-use, reliable interface for playing sounds, regardless of Java version or target OS).

I am continuing work on my 3D sound mixer as well (as a separate project), and will attempt to further optimize it to reduce the latency problems I’m experiencing with it. I’ll post any progress related to it on my 3D Sound System thread.

cylab · September 1, 2011, 7:28am

While I understand this statement for pitch change using some FFT based algorithm, I don’t see latency introduced by panning, since it is easily achieved by changing the volume of the left or right channel.

Also you can simulate pitch by resampling the source data for playback “on the fly” without needing a buffer.

Having said that, this will of course only work for uncompressed samples, so maybe you are targeting a wider variety of source material…

paulscode · September 1, 2011, 11:41am

[quote=“cylab,post:5,topic:37165”]
Right, but these are achieved by apply additional math to the data from each line being mixed. Each one feature by itself isn’t much, but the more you do with the more lines you mix, the more resources it requires, adding up to latency (which is why I also mentioned I won’t even implement per-line gain if it takes too much resources). This is meant to be quick and efficient, at the cost of features. My 3D mixer takes the other route (less efficient in favor of more features).

cylab · September 1, 2011, 12:41pm

I still don’t get why this adds to latency? It might use CPU cycles, but as long as you don’t need buffers in your algorithms, latency will always be none.

Is there something I don’t get here?

Riven · September 1, 2011, 12:54pm

Java is not that slow… :

paulscode · September 1, 2011, 2:40pm

[quote=“cylab,post:7,topic:37165”]
For example, if it takes 20 ms to mix all the lines, then any line that was queued up to play 20 ms ago has experience 20 ms of latency. (and any external synchronization that required a millisecond position for playback is now off)

I agree I’m being a bit of a [German fascist party controlling Germany from 1933 to 1945] with this, but do you understand the the reasoning for wanting an efficient and precise method for mixing and playback of multiple lines? (like I said, my other mixer has all the features - that’s not the goal with this project)

cylab · September 1, 2011, 2:54pm

Yeah I know what latency is, but you still refuse to tell me why you would have latency at all for mixing, volume change and panning.

mixing is: (value1+value2+…+valueN)/N
volume change is: valueNvolume
panning is: valueLvolumeL, valueR*volumeR

so what is it, I don’t get here?

Granted I never did anything with JavaSound, but I did an amiga module player on an Atari ST - and it didn’t even had an DAC

CommanderKeith · September 1, 2011, 3:35pm

I’m really looking forward to using this paul, great initiative.

I think that paul is saying that any more calcs on the data being written to the buffer slow down the process of writing to the buffer which makes latency.

Sound is written to a buffer, discussed here:
http://www.soundonsound.com/sos/jan05/articles/pcmusician.htm

[quote]No computer operating system can do everything at once, so a multitasking operating system such as Windows or Mac OS works by running lots of separate programs or tasks in turns, each one consuming a share of the available CPU (processor) and I/O (Input/Output) cycles. To maintain a continuous audio stream, small amounts of system RAM (buffers) are used to temporarily store a chunk of audio at a time.

Some plug-ins add latency to the audio path, as revealed by this Plug-In Information window in Cubase SX. The window shows which plug-ins exhibit additional latency when used, and whether or not to automatically compensate for it.
For playback, the soundcard continues accessing the data within these buffers while Windows goes off to perform its other tasks, and hopefully Windows will get back soon enough to drop the next chunk of audio data into the buffers before the existing data has been used up. Similarly, during audio recording the incoming data slowly fills up a second set of buffers, and Windows comes back every so often to grab a chunk of this and save it to your hard drive.

If the buffers are too small and the data runs out before Windows can get back to top them up (playback) or empty them (recording) you’ll get a gap in the audio stream that sounds like a click or pop in the waveform and is often referred to as a ‘glitch’. If the buffers are far too small, these glitches occur more often, firstly giving rise to occasional crackles and eventually to almost continuous interruptions that sound like distortion as the audio starts to break up regularly.

Making the buffers a lot bigger immediately solves the vast majority of problems with clicks and pops, but has an unfortunate side effect: any change that you make to the audio from your audio software doesn’t take effect until the next buffer is accessed. This is latency, and is most obvious in two situations: when playing a soft synth or soft sampler in ‘real time’, or when recording a performance. In the first case you may be pressing notes on your MIDI keyboard in real time, but the generated waveforms won’t be heard until the next buffer is passed to the soundcard. You may not even be aware of a slight time lag at all (see ‘Acceptable Latency Values’ box), but as it gets longer it will eventually become noticeable, then annoying, and finally unmanageable.
[/quote]

paulscode · September 1, 2011, 3:47pm

Sorry, cylab, let me explain it a little better.

Scenareo:
You want to mix multiple input lines. They each have byte data that needs to be combined into a single output stream. The mixer has an interface which lets the user attach the lines, and signal when they should start or stop. A thread running within the mixer does all the mixing of the byte data from each line.

Inter-workings:
The thread which is doing the mixing can not process every line at the same time (obviously), so it must process them in a linear fashion. It starts with the first line, grabs a block of data [i.e. a buffer] to mix, applies any transformations to that data (gain changes, pan changes, sample-rate changes, etc), and moves on to the next line. Once it reaches the end, after combining the data from all the lines, it pushes the result out to the hardware to be played.

Where latency originates:
The calculation for each line takes some amount of time, however minuscule that might be. Lets say for the sake of argument that to grab the data, do a gain calculation, do a pan calculation, to a sample-rate change, and move on to the next line it takes a total of 100 ns (this is a totally made up number). A summation of that unit time requirement for each line is equal to the amount of latency that will be experienced from the time the user told the mixer “hey, start playing!” and when the audio data was pushed to the speaker. It is also important to point out that the [average] unit time will increase depending on how heavily the user is loading the CPU with other processes.

How much latency is there? Well, it depends on the number of lines you are mixing and how much time you are spending per line. In the above scenario with my totally arbitrary number, the latency isn’t too bad - for 250 lines would be 25 ms. For 1000 lines, it would be .1 seconds. But it is there. As I’ve mentioned, the point of this project is efficiency, not features. As much processing as I can remove from the equation, the better. I assume the main concern is not having a per-line gain control? That may be an important enough feature that it should be included in the mixer, rather than removed simply for the sake of efficiency. Pan and pitch, on the other hand, are not important features for most people, and therefore should be eliminated. I’ve written a lot of “bloatware” in my time, but that doesn’t mean I can’t be a [German fascist party… yeh, you get the picture] once in a while when it suits my goals

Riven · September 1, 2011, 4:03pm

Bah, premature optimisation.

Who needs 250 lines? Most of us would be more than happy with 5.

cylab · September 1, 2011, 4:03pm

Thanks for taking the time

Since I could do pitch and volume change for 4 lines on a 7,81MHz CPU using a 4 byte buffer (a 32bit register :P) with having cycles left for more, this boils down to Rivens comment:

(OK, it was in assember, and ~6bit 8KHz playback…)

paulscode · September 1, 2011, 4:19pm

I totally get your points, but when I set my mind to efficiency I stick to it. All the “bloat” of this project will be in the surrounding easy-to-use interface. The core of the mixer where all the heavy lifting is being done is going to be as efficient as I can make it while still accomplishing its purpose: mixing of inputs into a single output.

cylab · September 1, 2011, 4:22pm

You code, you decide Will it be open, so we can fork it?

paulscode · September 1, 2011, 4:24pm

Yes, will be open source (as will my other mixer with the additional features)

paulscode · September 1, 2011, 6:25pm

Progress update:

I’ve got the basic project put together. Attaching lines, starting, stopping, and mixing works nicely. I’ve been borrowing most of the code from my other mixer project, so nothing much new as of yet. There is a little more to complete on this part, such as drain and flush, as well as gain changes.

The next big component is the format conversion code. Currently, the mixer is assuming a particular input (and output) audio format, which of course is not all that adaptable. I need to be able to let the user specify the output format, and to mix a variety of input formats. Obviously format conversion impacts performance, but this is certainly a feature that people will need. For folks who need ultra optimization, they just have to make sure all their input data is in the same audio format, and use the same for the output audio format as well (thus bypassing all the format conversion steps)

gouessej · September 2, 2011, 8:51am

What about this?
http://java.net/projects/gervill/sources/Mercurial/content/src/com/sun/media/sound/AudioFloatConverter.java

There is no license problem, it uses GPL v2 with Classpath exception, it does not force you to use GPL (even though I personally prefer this one).

philfrei · September 2, 2011, 9:30am

I’m looking forward to seeing and hearing what you come up with and applaud your efforts!

I have some apprehensions about the algorithm you describe here:

[quote]Inter-workings:
The thread which is doing the mixing can not process every line at the same time (obviously), so it must process them in a linear fashion. It starts with the first line, grabs a block of data [i.e. a buffer] to mix, applies any transformations to that data (gain changes, pan changes, sample-rate changes, etc), and moves on to the next line. Once it reaches the end, after combining the data from all the lines, it pushes the result out to the hardware to be played.
[/quote]
It would seem to me you would want to bring in some defined amount of data into the inner loop from each line, at the “same time,” then do a single operation that includes each buffer’s i-th element. In other words, progress by frames, not by tracks or channels.

Maybe that is what you are doing?

Or maybe it isn’t. I noticed for example that a free, tutorial OggVorbis Player iterates once for each stereo track of the innermost byte buffer in turn, which is a lot less efficient than processing both channels at the same time, especially if you are going to apply any sort of processing to the individual frames.