Not so long ago I made a statement about how I found using fixed point arithmetic to be faster when doing linear interpolation of audio in Java. This has been bugging me ever since, as in the floating-point code I hadn’t really tried to do anything fast. So I’ve conducted a little test to see if floating point arithmetic really is slower.
Here’s the code I used, it’s a simple class that generates a sine wave at the specified frequency (specified as input samples/output sample) by linear-interpolating a 1024 entry wavetable. It’s not the most efficient algorithm, as there is an end-of-waveform check in the inner loop which could be hoisted with some clever arithmetic, but it’s relatively easy to understand. There is an implementation of the algorithm for 15 bit fixed point, 32 bit float, and 64 bit double types.
I had stated that you need a modulo-divide to get the fractional part when using floating point arithmetic. Actually this is not true, you can just subtract the integer part instead.
public class Interpolator {
private static final int
FP_SHIFT = 15,
FP_ONE = 1 << FP_SHIFT,
FP_MASK = FP_ONE - 1,
BUF_LEN = 1024,
FP_BUF_LEN = BUF_LEN << FP_SHIFT;
private short[] wav_buf;
public Interpolator() {
int idx;
/* Initialise a sine wave as the input.*/
wav_buf = new short[ BUF_LEN + 1 ];
for( idx = 0; idx < BUF_LEN; idx++ ) {
wav_buf[ idx ] = ( short ) ( Math.sin( Math.PI * 2 * idx / BUF_LEN ) * 32767 );
}
wav_buf[ 0 ] = wav_buf[ BUF_LEN ];
}
public double interpolate_double( int[] mix_buf, int length, double step, double phase ) {
int idx, i;
double c, m, x;
for( idx = 0; idx < length; idx++ ) {
i = ( int ) phase;
c = wav_buf[ i ];
m = wav_buf[ i + 1 ] - c;
x = phase - i;
mix_buf[ idx ] += ( int ) ( m * x + c );
phase += step;
if( phase >= BUF_LEN ) {
phase -= BUF_LEN;
}
}
return phase;
}
public float interpolate_float( int[] mix_buf, int length, float step, float phase ) {
int idx, i;
float c, m, x;
for( idx = 0; idx < length; idx++ ) {
i = ( int ) phase;
c = wav_buf[ i ];
m = wav_buf[ i + 1 ] - c;
x = phase - i;
mix_buf[ idx ] += ( int ) ( m * x + c );
phase += step;
if( phase >= BUF_LEN ) {
phase -= BUF_LEN;
}
}
return phase;
}
public int interpolate_int( int[] mix_buf, int length, int step, int phase ) {
int idx, i, c, m, x;
for( idx = 0; idx < length; idx++ ) {
i = phase >> FP_SHIFT;
c = wav_buf[ i ];
m = wav_buf[ i + 1 ] - c;
x = phase & FP_MASK;
mix_buf[ idx ] += ( m * x >> FP_SHIFT ) + c;
phase += step;
if( phase >= FP_BUF_LEN ) {
phase -= FP_BUF_LEN;
}
}
return phase;
}
public static void main( String[] args ) throws Exception {
int idx, num_cycles;
int phase_i, pi_i;
float phase_f, pi_f;
double phase_d;
long time;
Interpolator interpolator;
int[] mix_buf;
num_cycles = 100000;
interpolator = new Interpolator();
mix_buf = new int[ BUF_LEN ];
phase_d = phase_f = phase_i = 0;
pi_i = ( int ) ( Math.PI * FP_ONE );
pi_f = ( float ) Math.PI;
time = System.currentTimeMillis();
for( idx = 0; idx < num_cycles; idx++ ) {
phase_i = interpolator.interpolate_int( mix_buf, BUF_LEN, pi_i, phase_i );
}
System.out.println( "INT: " + ( System.currentTimeMillis() - time ) );
time = System.currentTimeMillis();
for( idx = 0; idx < num_cycles; idx++ ) {
phase_f = interpolator.interpolate_float( mix_buf, BUF_LEN, pi_f, phase_f );
}
System.out.println( "FLOAT: " + ( System.currentTimeMillis() - time ) );
time = System.currentTimeMillis();
for( idx = 0; idx < num_cycles; idx++ ) {
phase_d = interpolator.interpolate_double( mix_buf, BUF_LEN, Math.PI, phase_d );
}
System.out.println( "DOUBLE: " + ( System.currentTimeMillis() - time ) );
}
}
And here are the results from my 1.5Ghz AMD Sempron (basically an Athlon XP with the same amount of cache as an Athlon Thunderbird) using Java 1.6.0. Values are in milliseconds:
INT: 1453
FLOAT: 2641
DOUBLE: 5922
So it appears that, in Java at least, integer arithmetic is faster than floating point arithmetic on my machine. I’m a bit confused as to why 32 bit floating point is so much faster than 64 bit. I only counted 6 FP values, which should all fit in the 7 x87 registers. Perhaps there is still some unavoidable register swapping that could account for it. Or maybe there are still improvements to be made to the JVM’s floating point support
Even so, the slowest result is still 6 seconds for over 100 MILLION samples of output. That’s enough for over 300 waveforms generated in real-time at 48khz. There are roughly 10 operations in the inner loop, which I work out to be about 170 megaflops. Some way off the numbers given for my machine (Sisoft Sandra suggests 2700 megaflops) but for real-world performance I’ve got no complaints
Any thoughts?
Cheers,
Martin