[quote]OK ! so with the Athlon64s - with SSE2 support - can I take it that the JVM will (indeed) use SSE2-style registers so that the double performance of the Athlon64s will be comparable to P4s ?
In general, is the SSE2 performance of Athlon64 as good as the P4 ? And specifically, when running Java apps with the -server option ?
And for the couple of reasons mentioned earlier 1) some CPUs use extended 80-bit precision and 2) some do all the computations in doubles and reconvert to floats (the IBM RS6000 workstation, IIRC, used to do that), is it worth the trouble to stick to floats for speed benefits if memory size is not a consideration ?
[/quote]
First a little intro on how the VM works:
The JVM has two sections of code (basically), platform independent and platform dependent code. The platform independent stuff are things that operate on the bytecodes, the IR, and then the optimizations (parsing, constant folding, loop opts, register allocation, etc).
The platform dependent stuff are basically match rules for instructions. So if the VM requires a Multiply Node (MulNode), the VM matches that to the appropriate rule in the particular architecture. Now this matching part is where the AMD64 hasn’t been fully optimized. Its mostly there, but there are parts missing, and things we don’t do, etc. So yes the VM uses SSE2 for AMD64 machines, but we might be doing a few things suboptimal.
I’ve also heard that the Athlons (XP and 64) have slower SSE performance compared to P4s. It may no longer be true in later revisions of the chip, etc. Heck I may have heard incorrectly as well. But anyway, the AMD64 as far as the JVM is concerened is just another chip, most of the optimizations are platform independent.
Oh don’t forget the AMD64 VM is a 64bit VM, while the X86 VM is a 32bit VM. Internally that means the 64bit VM has to handle larger pointers, etc. Although the VM gains 8 registers for the AMD64 so overall there is a win in performance.