Float Division vs Multiplication

Hi all,

I was having a look at the source code to Quake 2 yesterday and was nosing around the math functions defined there. I came across the vector normalisation function:

vec_t VectorNormalize (vec3_t v)
	float	length, ilength;

	length = v[0]*v[0] + v[1]*v[1] + v[2]*v[2];
	length = sqrt (length);		// FIXME

	if (length)
		ilength = 1/length;
		v[0] *= ilength;
		v[1] *= ilength;
		v[2] *= ilength;
	return length;


I was initially confused by the 1 division followed by multiplication, rather than a straight up division. I suspected it might be that multiplication with floats is significantly faster than division, hence 1 division and 3 multiplications is faster than 3 divides. Looks like I was right:

So, I remember JOML is currently doing 3 divisions (since I wrote it before I knew about this) and thought it might make a little performance enhancement to go back and precompute divisions across all functions that can benefit from it. However, when I tried making my own benchmark, I found the opposite to be true, the 3 divides were around 10x faster than multiplication!

Here are the two functions I tested:

    public void normaliseFast() {
        float length, ilength;

        length = x * x + y * y + z * z;
        length = (float) Math.sqrt(length);

        if (length != 0) {
            ilength = 1.0f / length;
            x *= ilength;
            y *= ilength;
            z *= ilength;

    public void normaliseSlow() {
        float length;

        length = x * x + y * y + z * z;
        length = (float) Math.sqrt(length);

        if (length != 0) {
            x /= length;
            y /= length;
            z /= length;

And here is my benchmark:

        Vector3f start = new Vector3f(7.f, 10.25f, 3.f);
        Vector3f end = new Vector3f(12.0f, 0.5f, 1.5f);
        Vector3f fastdir = new Vector3f();
        Vector3f.sub(end, start, fastdir);
        Vector3f slowdir = new Vector3f(fastdir);
        long fastStart = System.nanoTime();
        long fastEnd = System.nanoTime();
        long fastTime = fastEnd - fastStart;
        long slowStart = System.nanoTime();
        long slowEnd = System.nanoTime();
        long slowTime = slowEnd - slowStart;
        System.out.println("Slow: " + slowTime + ", Fast: " + fastTime);

My results are as follows (5 tests):

Slow: 6462, Fast: 225802 (computer was still booting up…)
Slow: 2661, Fast: 25849
Slow: 2661, Fast: 27370
Slow: 4562, Fast: 38774
Slow: 2661, Fast: 26610

Am I missing something here? I was expecting it to be the other way around! Obviously if some enhancements have been made to the JVM then I won’t implement this in JOML.

Hi Neoptolemus,
you seem to not having had a look into JOML for some time. :slight_smile:
All such normalizations were changed to 3 multiplications.
Cheers, Kai

Factor 10 difference…? Something else is amiss.

Cas :slight_smile:

Ah! I realise I was looking at the wrong repository, as one of my laptops has an outdated shortcut. Oops!

Still weird how I get completely wrong results though, not sure what I’ve done wrong…

I am still subscribed to JOML so I get email notifications. It’s amazing how far you’ve taken it. I think I had pushed it as far as I could when you took over, so it was definitely the right decision :slight_smile:

You are always welcome to join me again on our journey to full Java-math-library-world-domination! :smiley:

Latency on division is about 40, multiply about 2.


[quote]Slow: 8.641 ms
Fast: 4.76 ms

benchmarks too

The longer you run it, the more it converges to the average time.