Native sqrt() faster than Math.sqrt() ?

So… I always wondered, whether Java’s Math.sqrt() would be faster than a call to the JNI, calling <math.h>'s/C’s sqrt().

If anyone else wondered, here is the result, printed by my program:

The Java code is the following (REALLY trivial):

package org.matheusdev.jni;

import java.util.Random;

public class SQRTNative {
	
	static {
		System.loadLibrary("SQRTNative");
	}
	
	private native static double sqrt(double d);
	
	public static void main(String[] args) {
		final long repeat = 2330000000L; // 2.330.000.000 times sqrt().
		final int values = 1000000;  //  1.000.000 a.k.a. 4 MB of doubles.
		long time; // local for measuring time.
		
		// Initialize random values:
		System.out.println("Starting to initialize values.");
		double[] vals = new double[values];
		Random rand = new Random();
		for (int i = 0; i < values; i++) {
			vals[i] = rand.nextDouble()*1000;
		}
		
		// Java's Math.sqrt:
		System.out.println("Initializing finished. Starting Java's Math.sqrt():");
		time = System.currentTimeMillis();
		for (long i = 0; i < repeat; i++) {
			Math.sqrt(vals[(int)(i%values)]);
		}
		System.out.println("Time taken with Math.sqrt(): " + (System.currentTimeMillis()-time));
		
		// Native sqrt() from C++:
		System.out.println("Starting native C++'s sqrt():");
		time = System.currentTimeMillis();
		for (long i = 0; i < repeat; i++) {
			sqrt(vals[(int)(i%values)]);
		}
		System.out.println("Time taken with native sqrt(): " + (System.currentTimeMillis()-time));
	}

}

And the code inside the “libSQRTNative.so” (running linux 64 bit here :wink: ) is simple too:

#include "jni.h"
#include <math.h>
#include "SQRTNative.h"

JNIEXPORT jdouble JNICALL
Java_org_matheusdev_jni_SQRTNative_sqrt(JNIEnv *env, jclass cls, jdouble d)
{
    return sqrt(d);
}

Built with “QtCreator’s build lib for release”, aka GCC. Haven’t messed around with any build flags :wink:

One thing more to add: I use “only” 1 MIO values, cause I can’t even create an Array of the size of 2.330.000.000 doubles, and besides, it would crash with an OutOfMemoryException anyways.

I used exactly 2.330.000.000, cause I have a running 2.33 GHz cpu at that point, so you can easily see, that Java’s sqrt takes about 8-9 clock ticks per computation, and the C++'s one takes aabout 77 cpu clock ticks.
These values are not perfect, because of the system needing resources, the Chromium running with 13 Tabs, the Eclipse running, one Dolphin running and a QtCreator running.

I assume that most time spent in the native version is the JNI overhead.

Also, this was a JNI-learing-and-have-fun-with-performance-tests-test-project, keep that in mind, and correct me, if I could do something better :wink:

This is because the JVM inlines the call to Math.sqrt. Also, the assumption that 1 call to sqrt = 1 clock tick is ludicrous :wink:

I didn’t assume something like that ???

You did, by assuming that it takes 8-9 clock ticks per computation if it took 8-9 seconds to compute your CPU’s clock rate in values. There are numerous other variables involved, including stepping through that for loop and also the FLOPS of your CPU :wink:

I run my calculation 2.330.000.000 times, which is the number of ticks, my cpu does in 1 second (2.33 GHz).
So if the program takes 8.7 seconds, it takes 8-9 CLOCK TICKS per computation. I said 8-9 clock ticks… I never said anything about 1 tick ???

Also, lets assume, it takes 1 clock tick:
Then the time, the program runs, would be 1 * 2.330.000.000 / 2.33GHz = 1 sec.

You’re right about the flops, but this is not really precise anyways.

EDIT:

computing the clock rate?

This is a bit of a worrying microbenchmark because on the server VM for example that entire Java sqrt loop should be optimised away to nothing. I’d at least change both loops to sum the result of the sqrt and output it to stdout or you risk getting totally random looking results running this elsewhere.

Cas :slight_smile:

Oh crap!.. You’re right ;D

results:

That really looks like it got optimized away… I wonder, why didn’t it just optimize away the whole loop? So it acctually does… nothing? :smiley:
Anyways, java seems to be about 2 times faster, than C++ with JNI.

EDIT: To the sum: they either share the exactly same algorithm, or java inlines the Math.sqrt() function with C++'s sqrt() func :DD

I was thinking that too. I originally wrote that in my first post, but quickly edited it because I noticed that it took 8 whole seconds for the first loop!

Ah you’re not a native English speaker then huh? :wink: (unless I can’t explain myself :persecutioncomplex:)

The 8 seconds might be due to the JVM not optimizing that piece of code the first time, it gets called… Just the about… 1.000.000.000. time…

Nope, I’m not native speaker, but I don’t compute any clock rate anywhere, so I don’t understand what you wanted to say :wink:

matheus simply knows how fast the clock speed is on his CPU. Or at least the speed it claims to be running out out of the box which may not quite match what it is currently doing at the time depending on power saving tech that may be interfering…

Cas :slight_smile:

The whole reason for my HotSpot optimization list post is so that people don’t need to experiment to determine known features. Look in the intrinsic list and you’ll see that Math.sqrt is indeed an intrinsic. So no function is ever called…a native SSE instruction will be issued and scheduled with surrounding code.