Micro benchmarks

Hi guys,

I was trying to make speed improvements in some code and got unexpected results. First look at some code:

  int[] iv = new int[1000];
  iv[0] = 8;
  iv[1] = 9;

  long before = System.currentTimeMillis();

  for (int ii=0; ii<100000; ii++)
  {
    for (int i=2; i<iv.length; i++)
    {
      iv[i] = (5 * iv[i-1] - 10 * iv[i-2]) / 3;
      iv[i] *= iv[i];
    }
  }

  long after = System.currentTimeMillis();
  System.out.println(after-before);

  before = System.currentTimeMillis();

  for (int ii=0; ii<100000; ii++)
  {
    int previous = iv[1];

    for (int i=2; i<iv.length; i++)
    {
      int v = (5 * previous - 10 * iv[i-2]) / 3;
      v *= v;
      iv[i] = v;
      previous = v;
    }
  }

  after = System.currentTimeMillis();
  System.out.println(after-before);

  float[] fv = new float[1000];
  fv[0] = 8.0f;
  fv[1] = 9.0f;

  before = System.currentTimeMillis();

  for (int ii=0; ii<100000; ii++)
  {
    for (int i=2; i<fv.length; i++)
    {
      fv[i] = (5.0f * fv[i-1] - 10.0f * fv[i-2]) * 0.333333333f;
      fv[i] *= fv[i];
    }
  }

  after = System.currentTimeMillis();
  System.out.println(after-before);

  before = System.currentTimeMillis();

  for (int ii=0; ii<100000; ii++)
  {
    float previous = fv[1];

    for (int i=2; i<fv.length; i++)
    {
      float f = (5.0f * previous - 10.0f * fv[i-2]) * 0.333333333f;
      f *= f;
      fv[i] = f;
      previous = f;
    }
  }

  after = System.currentTimeMillis();
  System.out.println(after-before);

OK, basically it is some math computation inside loops.
I ran these tests several times in the same process and got the following results:

int : 5203
int aliased : 4750
float : 4172
float aliased : 5031

The absolute values by themselves are not really important but they reveal a pattern.

First conclusion: the floats are faster that the ints on my platform (WinXP, Athlon XP, Java 1.4.2beta) !!! Incredible isn’t it ?

Naively, I thought otherwise. I guess the SSE2 instructions now used by the JVM kick in to boost dramatically the float performance. It would be interesting to see the results on other platforms (SSE or no SSE). That is good news for Open GL Java :wink:

Second conclusion: aliasing is always better with ints but counter productive with floats which also was not obvious to me.

I am perfectly aware that these results should be taken with a grain of salt (if you slightly modify the code inside the loop you may end up with other conclusions).

The bottom line: it is going very difficult to optimize code since the optimization on one platform may end up decreasing perf on another platform. The only way to know is to test !

I’d be interesting in other people testing with other hardware, OS and JVM versions.

Tried this test on my machine (adding some usual repeat stuff)

java 1.4.2 beta, RedHat 9.0, athlon 900:

int: 7769
int aliased: 7409
float: 4250
float aliased: 5273

int: 8080
int aliased: 7172
float: 4325
float aliased: 5263

int: 8075
int aliased: 7160
float: 4322
float aliased: 5268

int: 8072
int aliased: 7173
float: 4312
float aliased: 5282

java -server:
int: 3530
int aliased: 2501
float: 8488
float aliased: 7738

int: 1937
int aliased: 1779
float: 8154
float aliased: 7718

int: 1983
int aliased: 1764
float: 8171
float aliased: 7722

int: 1945
int aliased: 1767
float: 8189
float aliased: 7712

??? server versus client… ???
ROFL - benchmarks are fun :slight_smile:

For even more laughs :o I tried it with gcj 3.2.2:
int: 7893
int aliased: 4297
float: 9014
float aliased: 5995

int: 9787
int aliased: 4584
float: 9833
float aliased: 6528

Pretty bad, ???, but wait - here is more,
compiled it with “gcj -msse2 -m3dnow -O2 -o test --main=Test Test.java”:
int: 2698
int aliased: 1798
float: 3997
float aliased: 1892

int: 2684
int aliased: 1790
float: 3987
float aliased: 1900

gcj rocks on this test ;D - is it cheating or is sun java just being slow? Maybe should try it with ibm as they (?) are faster on calculus :wink:

Geeeeee !
Your results are puzzling because in the end you do not know what to shoot for.
With both integers and floats, the values span (roughly) from 2 to 9 ! And the floats can be faster or slower that the ints.
I note than with the new JVM, the floats ‘can be’ REALLY fast …
How can you be sure that the optimizations you made on your platform are relevant at all ?
The only constant thing for sure is that aliased integers are always faster than non aliased.

Food for thought, people …

I did some microbenchmarking too (with J2SE 1.4.2 + WinXP) and I found out that floating point division was working faster than integer division. However, floating point multiplication was working slower than integer multiplication.

By the way, you can find an interesting table comparing the relative performance of common operations on PIII-733 using C++ here (appendix B):
http://www.tantalon.com/pete/cppopt/appendix.htm

It states that “In fact, floating-point division is as fast or faster than integer division”. So this is not a Java only “feature”. Somebody should do a table like that for Java and post it here ;).

Isn’t it normal that floating point is better for division/multiplation and integer is better for addition/negation?

I thought it was common knowledge ???

[quote]Isn’t it normal that floating point is better for division/multiplation and integer is better for addition/negation?

I thought it was common knowledge ???
[/quote]
Read the earlier posts. It depends. For example, I just said on my earlier post that “floating point multiplication is SLOWER than integer multiplication”.

oh yeah, soz :smiley:

well… you’ve proved 1 thing.

Micro-optimisations are a waste of time :smiley:

and attempting to benchmark micro-optimisations are an even larger waste of time :wink:

If I may, I think the real reason that floats appear to be so much faster has less to do with SIMD/SSE, and more to do with floating-point coprocessors. THe idea that only integers should be used in games comes from back in the days of 386 machines where floating point had to be simulated in software. Unfortunately, it was one of those ideas that the development community never got past. I remember only one game that ever advertised the fact that it used floating point math. It was some sort of 3D car/shoot’em up game that ran on 486s and Pentiums. It actually ran quite well, but the market (to my knowledge) never picked up on this little tidbit.

The fact that the server VM did float operations so much slower than the client VM is a performance issue that I would file a bug report on.

Performance in that area should be the same or better.

GCC on intel is known to suck. But for floats MSVC 6.0 is also known to suck. The Intel compiler produces much better code, although I hear that MS has caught up with the .net compiler.

I like that the server compiler outperformed GCJ on ints… this sort of information is helpful in dispelling the myth that Java is just all around “slow”.

Here are some numbers from the above code for Mac OS X 1GHz PowerPC G4
client VM

test repeated 4 times… these are the typical results…

int: 3809
int a: 2276
float: 4653
float a: 3410

int: 3803
int a: 2268
float: 4619
float a: 3424

With -server ALL numbers are HIGHER by about 100-200.

swpalmer,

On Mac, the floats are somewhat slower than the ints, but all in all, the JVM/Max Os X/G4 combunation kicks butts !!!

Abuse,

micro-optimizations may be a waste of time but benchmarking them is not IMO : you are always safe with aliasing int arrays, the floats are ‘usually’ pretty fast (at least on the 3 config that have been tested so far).

The -server option seems to decrease dramatically the speed (unless the micro benchmark was too short for the JVM optimizations to kick in).

I would not have bet on these facts before.

Well, my take on this issue would be that java is really fast on even this microbenchmark!

As we all know (? ;D ) any java vm become more competitive the more complex a problem it faces, as this test is a very easy one - the gcj native compiler should IMHO outperform sun and ibm java vm (as it is able to compile with all optimizations - “easy to calculate” optimizations). As I (and you) noticed sun java did quite good, even though the big difference between java -server and client! As Altair said I believe this big server - client difference to be a java1.4.2 bug - if anyone could test this with another java version…?

Anyway I don’t really understand you Altair when you said I didn’t know what I was testing? I was just doing your test on a different architecture and trying to get as much data as possible - then it’s up to the experts to try to get something worth to mention out of this data! 8)
I.e. I’m not “shooting” for anything except more data!

[quote]On Mac, the floats are somewhat slower than the ints, but all in all, the JVM/Max Os X/G4 combunation kicks butts !!!
[/quote]
I think this is more because the PowerPC is likely much easier to compile decent code for, since it has a much better design than intel (despite being behind in terms of clock speed). I mean, when you don’t have to worry so much about what you will keep in registers and what you need to shove on the stack because you actually have a decent amount of general purpose registers… well it just seems like optimizing on that architecture would be easier.

AndersDahlberg

I did not make myself clear.
" Your results are puzzling because in the end ‘we’ do not know what to shoot for ".

I meant that it is not obvious from your results what to target to achieve the best speed : use floats or integers ? It depends on the platform (hardware, JVM, OS). Alias / not alias ? it all depends on the code inside of the loop AND the platform.

altair: Ok, then we understand each other :slight_smile:

…for my part I don’t really care which one is faster - will probably never become a big issue for me anyways (1x or 2x slower on a test like this is “almost nothing” :wink:

On my 3.06GHz P4 (WIndows XP) I get

int: 2374
int aliased: 1421
float: 843
float aliased: 828
float (div): 1374
float aliased (div): 1343

The extra pair of float results are using /3f instead of *0.333…
These results are with the server VM, for the client VM the results are:

4716
4372
153369
154689
155082
156864

Ouch!

The results for double are essentially the same as for float (both client and server).

When you guys are using the -server option, are you putting code in there to “warm it up” before actually running it?

Maybe run the tests twice and throw out the first results?

[quote]When you guys are using the -server option, are you putting code in there to “warm it up” before actually running it?

Maybe run the tests twice and throw out the first results?
[/quote]
Yes, but the length of the loop is sufficiently long that the changes aren’t large.

[quote]Yes, but the length of the loop is sufficiently long that the changes aren’t large.
[/quote]
There’s not much to warm up in a micro benchmark so that makes sense :slight_smile:

Erik

Okay, just had to give it a try on my work machine:
P3 1Ghz
512MB mem
WinNT

java 1.4.1_01 -client
Run #1
int: 6409
int alias: 5939
float: 66125
float alias: 64413

Run #2
int: 6819
int alias: 5799
float: 64873
float alias: 64703

Run #3
int: 6850
int alias: 5858
float: 64834
float alias: 65914

-server
Run #1
int: 5668
int alias: 5658
float: 62330
float alias: 62229

Run #2
int: 5598
int alias: 6259
float: 66416
float alias: 66996

Run #3
int: 5828
int alias: 6079
float: 62210
float alias: 62239

Looks like under WinNT the SSE instructions aren’t being used? Float math is horrible! Maybe that’s why some of the demo games run very slow and jerky on my system. Hopefully we’re upgrading to WinXP by the end of the year!

Unlike previous results, the results on the P3/NT would be enough to ban floats (or remove this platform from the targets). You did not give the version of the JVM though (upgrading could help improve the score).

Consider a game heavily using floats: it would fly on the Mac and the fast P4s (with XP) but would be implayable with a ‘slow’ PC with NT. Less so with integers.

“Write once run anywhere” seems really not to be an easy task as far as performance is concerned …