In the past I have found a very elegant and reliable way to do microbenchmarking while compensating for the effects of overhead.
Calculate the time it takes one loop without the function being tested, and one with.
public static void main(String[] args)
int samples = 1000000;
long l1;
long l2;
int n = 0;
for (int tries = 0; tries < 10; tries++)
{
l1 = System.nanoTime();
for (int i = 0; i < samples; i++)
{}
l1 = System.nanoTime() - l1;
l2 = System.nanoTime();
for (int i = 0; i < samples; i++)
{
n++;
}
l2 = System.nanoTime() - l2;
System.out.println("t0 = " + l1 + "\tt=" + l2 + "\tspeed=" + samples * 1e9f / (l2 - l1));
}
}
Result:
t0 = 1251556 t=779708 speed=-2.11932659E9 t0 = 548115 t=838654 speed=3.44187878E9 t0 = 546159 t=752889 speed=4.837227E9 t0 = 531632 t=784457 speed=3.95530496E9 t0 = 523251 t=883911 speed=2.77269453E9 t0 = 546159 t=807924 speed=3.82022042E9 t0 = 543924 t=804013 speed=3.84483763E9 t0 = 545879 t=803454 speed=3.88236442E9 t0 = 543365 t=804013 speed=3.83659187E9 t0 = 546159 t=804012 speed=3.87817856E9
My 2.5GHz Intel Core (duo) can do about 3.88 billion times n++ in a single thread.
When removing n++ (making both loops the same), the result is:
t0 = 1259099 t=602311 speed=-1.52256128E9 t0 = 544762 t=556216 speed=8.730574E10 t0 = 551467 t=540851 speed=-9.4197441E10 t0 = 666007 t=552864 speed=-8.8383724E9 t0 = 545041 t=553982 speed=1.11844311E11 t0 = 545600 t=556496 speed=9.1776795E10 t0 = 546159 t=603708 speed=1.73764956E10 t0 = 561245 t=540571 speed=-4.8369934E10 t0 = 553701 t=542527 speed=-8.9493463E10 t0 = 551467 t=607619 speed=1.78088038E10
which are ridiculously high numbers (30 times higher) meaning this overhead compensation is fairly reliable relative to the measuring noise. (increasing samples increases accuracy)
Now on to atan2.
public static void main(String[] args)
{
Random RND = new Random();
int samples = 1000000;
long l1;
long l2;
long l3;
float x;
float y;
for (int tries = 0; tries < 10; tries++)
{
l1 = System.nanoTime();
for (int i = 0; i < samples; i++)
{
x = (RND.nextFloat() - 0.5f) * 200;
y = (RND.nextFloat() - 0.5f) * 200;
}
l1 = System.nanoTime() - l1;
l2 = System.nanoTime();
for (int i = 0; i < samples; i++)
{
x = (RND.nextFloat() - 0.5f) * 200;
y = (RND.nextFloat() - 0.5f) * 200;
FastMath.atan2(x, y);
}
l2 = System.nanoTime() - l2;
l3 = System.nanoTime();
for (int i = 0; i < samples; i++)
{
x = (RND.nextFloat() - 0.5f) * 200;
y = (RND.nextFloat() - 0.5f) * 200;
StrictMath.atan2(x, y);
}
l3 = System.nanoTime() - l3;
float fast = samples * 1e9f / (l2 - l1);
float slow = samples * 1e9f / (l3 - l1);
System.out.println("speed fast=" + fast + "\tspeed slow=" + slow + "\tfactor=" + fast / slow);
}
}
result:
speed fast=2.5838188E7 speed slow=3102130.2 factor=8.329176 speed fast=4.2304908E7 speed slow=6922386.0 factor=6.111319 speed fast=3.6428208E7 speed slow=6051220.5 factor=6.019977 speed fast=3.3007322E7 speed slow=5712438.0 factor=5.7781496 speed fast=3.6745696E7 speed slow=6234034.0 factor=5.8943686 speed fast=4.6704136E7 speed slow=6152503.5 factor=7.5910783 speed fast=3.9707864E7 speed slow=6488173.5 factor=6.1200376 speed fast=5.0430332E7 speed slow=6682681.5 factor=7.5464215 speed fast=4.770246E7 speed slow=6730517.0 factor=7.087488 speed fast=4.857506E7 speed slow=6979997.5 factor=6.9591804
I’ve also invented another, even more accurate, but more limited, microbenchmarking technique. Call the function once in the first loop and 10 times in the other, then solve for the run time of one execution. This is not applicable for atan2 because you can’t call it multiple times in a row and expect the performance to be flat (and also calling random in between makes the technique useless).