Hi all,
I was having a look at the source code to Quake 2 yesterday and was nosing around the math functions defined there. I came across the vector normalisation function:
vec_t VectorNormalize (vec3_t v)
{
float length, ilength;
length = v[0]*v[0] + v[1]*v[1] + v[2]*v[2];
length = sqrt (length); // FIXME
if (length)
{
ilength = 1/length;
v[0] *= ilength;
v[1] *= ilength;
v[2] *= ilength;
}
return length;
}
I was initially confused by the 1 division followed by multiplication, rather than a straight up division. I suspected it might be that multiplication with floats is significantly faster than division, hence 1 division and 3 multiplications is faster than 3 divides. Looks like I was right:
So, I remember JOML is currently doing 3 divisions (since I wrote it before I knew about this) and thought it might make a little performance enhancement to go back and precompute divisions across all functions that can benefit from it. However, when I tried making my own benchmark, I found the opposite to be true, the 3 divides were around 10x faster than multiplication!
Here are the two functions I tested:
public void normaliseFast() {
float length, ilength;
length = x * x + y * y + z * z;
length = (float) Math.sqrt(length);
if (length != 0) {
ilength = 1.0f / length;
x *= ilength;
y *= ilength;
z *= ilength;
}
}
public void normaliseSlow() {
float length;
length = x * x + y * y + z * z;
length = (float) Math.sqrt(length);
if (length != 0) {
x /= length;
y /= length;
z /= length;
}
}
And here is my benchmark:
Vector3f start = new Vector3f(7.f, 10.25f, 3.f);
Vector3f end = new Vector3f(12.0f, 0.5f, 1.5f);
Vector3f fastdir = new Vector3f();
Vector3f.sub(end, start, fastdir);
Vector3f slowdir = new Vector3f(fastdir);
long fastStart = System.nanoTime();
fastdir.normaliseFast();
long fastEnd = System.nanoTime();
long fastTime = fastEnd - fastStart;
long slowStart = System.nanoTime();
slowdir.normaliseSlow();
long slowEnd = System.nanoTime();
long slowTime = slowEnd - slowStart;
System.out.println("Slow: " + slowTime + ", Fast: " + fastTime);
My results are as follows (5 tests):
Slow: 6462, Fast: 225802 (computer was still booting up…)
Slow: 2661, Fast: 25849
Slow: 2661, Fast: 27370
Slow: 4562, Fast: 38774
Slow: 2661, Fast: 26610
Am I missing something here? I was expecting it to be the other way around! Obviously if some enhancements have been made to the JVM then I won’t implement this in JOML.