Hi guys,
I was trying to make speed improvements in some code and got unexpected results. First look at some code:
int[] iv = new int[1000];
iv[0] = 8;
iv[1] = 9;
long before = System.currentTimeMillis();
for (int ii=0; ii<100000; ii++)
{
for (int i=2; i<iv.length; i++)
{
iv[i] = (5 * iv[i-1] - 10 * iv[i-2]) / 3;
iv[i] *= iv[i];
}
}
long after = System.currentTimeMillis();
System.out.println(after-before);
before = System.currentTimeMillis();
for (int ii=0; ii<100000; ii++)
{
int previous = iv[1];
for (int i=2; i<iv.length; i++)
{
int v = (5 * previous - 10 * iv[i-2]) / 3;
v *= v;
iv[i] = v;
previous = v;
}
}
after = System.currentTimeMillis();
System.out.println(after-before);
float[] fv = new float[1000];
fv[0] = 8.0f;
fv[1] = 9.0f;
before = System.currentTimeMillis();
for (int ii=0; ii<100000; ii++)
{
for (int i=2; i<fv.length; i++)
{
fv[i] = (5.0f * fv[i-1] - 10.0f * fv[i-2]) * 0.333333333f;
fv[i] *= fv[i];
}
}
after = System.currentTimeMillis();
System.out.println(after-before);
before = System.currentTimeMillis();
for (int ii=0; ii<100000; ii++)
{
float previous = fv[1];
for (int i=2; i<fv.length; i++)
{
float f = (5.0f * previous - 10.0f * fv[i-2]) * 0.333333333f;
f *= f;
fv[i] = f;
previous = f;
}
}
after = System.currentTimeMillis();
System.out.println(after-before);
OK, basically it is some math computation inside loops.
I ran these tests several times in the same process and got the following results:
int : 5203
int aliased : 4750
float : 4172
float aliased : 5031
The absolute values by themselves are not really important but they reveal a pattern.
First conclusion: the floats are faster that the ints on my platform (WinXP, Athlon XP, Java 1.4.2beta) !!! Incredible isn’t it ?
Naively, I thought otherwise. I guess the SSE2 instructions now used by the JVM kick in to boost dramatically the float performance. It would be interesting to see the results on other platforms (SSE or no SSE). That is good news for Open GL Java
Second conclusion: aliasing is always better with ints but counter productive with floats which also was not obvious to me.
I am perfectly aware that these results should be taken with a grain of salt (if you slightly modify the code inside the loop you may end up with other conclusions).
The bottom line: it is going very difficult to optimize code since the optimization on one platform may end up decreasing perf on another platform. The only way to know is to test !
I’d be interesting in other people testing with other hardware, OS and JVM versions.