Micro benchmarks

swpalmer · May 31, 2003, 2:00am

[quote]You did not give the version of the JVM though (upgrading could help improve the score).
[/quote]
Look again it says 1.4.1_01

mthornton · May 31, 2003, 9:59am

[quote]Okay, just had to give it a try on my work machine:
P3 1Ghz
512MB mem
WinNT

java 1.4.1_01 -client
…
Looks like under WinNT the SSE instructions aren’t being used? Float math is horrible! Maybe that’s why some of the demo games run very slow and jerky on my system. Hopefully we’re upgrading to WinXP by the end of the year!
[/quote]
The use of SSE is new in 1.4.2 beta and even then only in the server version.

princec · May 31, 2003, 11:26am

only in the server version

Fools. Java gaming once again takes a poke in the eye.

Cas

mthornton · May 31, 2003, 1:20pm

One serious problem with this benchmark is that the floating point calculation overflows (and then becomes NaN). The processor timing for the subsequent operations may not be very representative of normal calculation.

mthornton · May 31, 2003, 1:29pm

I’ve now changed the constants slightly and found that on my P2/400 the floating point time changes from 155 seconds down to 13 seconds.
Curiously that 155 second time for the original benchmark is almost the same as for the 3.06GHz P4 I have at work when SSE is not used. Evidently the SSE path is much faster at hanldling NaN values than the ordinary case, but this doesn’t tell us much about real life floating point speed.

genepi · May 31, 2003, 10:30pm

And with benchmarks, ALWAYS check the results when you compare the timing! http://developer.java.sun.com/developer/bugParade/bugs/4860749.html is my experience with JDK 1.4.2Beta… :o
Though speedy, the double calculations were sometimes wrong on the Windows platform…

mthornton · June 2, 2003, 7:54am

[quote]> only in the server version

Fools. Java gaming once again takes a poke in the eye.

Cas
[/quote]
The standard fp performance is reasonable provided that you avoid the NaN case. Of course the -server version is faster (wouldn’t be much point otherwise), but that is also true of the integer results.
So perhaps we should look for a benchmark which does something realistic with the values finite and preferably non zero. My slight variation in this benchmark results in the values converging to zero which isn’t ideal either (too easy). In this case on my P4 the floating point calculation is faster than the integer method.

Any suggestions for a relevant calculation which has both integer and fp forms.

princec · June 2, 2003, 9:20am

Mandelbrot?
And of course, vertex transformation is where it’s at these days. And the reason why the client FP performance isn’t reasonable at all, just slow :-[

Cas

mthornton · June 2, 2003, 10:35am

[quote]Mandelbrot?
And of course, vertex transformation is where it’s at these days.
[/quote]
Probably a bad example as you would really like the transformations to be done by all those transform pipelines on the graphics card.

I wonder if the graphics card FP could usefully be used for general purpose fp — I seem to recall that the Sony game systems being hooked up as a ‘supercomputer’ were doing something like that.

cfmdobbie · June 2, 2003, 12:47pm

Coincidentally, there was a thread about that a couple of days ago!

The conclusion was “technically yes, but getting the results out again is too slow”, I believe.

Jeff · June 24, 2003, 9:28pm

The problem with Microbenchmarks is that you really need to understand what it is yo uare measuring.

As someone has already pointed out, the latest Sun VM does SIMD in the compiler. So if it sees a set of bytecodes that can be folded down to SIMD the result will be much faster then if it doesnt and does them with discrete operations.

In english, this means that doing things like matrix multiplys are MUCH faster then doing the same numebr of unrelated mults.

Jeff