[quote]Quote
[quote]also x*2 is much slower than x<<1
[/quote]
Again, not on today’s VMs, especially if the 2 is written as a compile time constant. This is one of the easiest optimizations for a compiler to make, so I would be shocked if the JVM wasn’t doing it. If I’m correct it’s even done in most C/C++ compilers these days, though I haven’t checked that.
[/quote]
No no no no! (no. :)) Indeed C compilers have done this forever, but the JVM never has.
If you don’t see a difference, that means your bottleneck (like memory I/O) is somewhere else.
You might have read my thread about MappedObjects I created yesterday. I added code to handle the case where the stride is a power-of-two and replace a single *16 by a <<4 (or *32 by <<5 so that matter), and saw the performance double. That was because the bottleneck was actually there. As you saw, in another situation, I could squeeze in a ‘massive’ swap64 method, without a speed-penalty. So if you’re not seeing your optimisation paying off, that doesn’t mean your code has become faster, just that the CPU is waiting. Once you remove the other bottleneck, you’ll notice that your earlier optimisation did speedup your code.
The VM is really not that smart, although it’s trivial to replace a constant multiplkication by a bitshift.
I’ll do some actual tests with your benchmark (yesterday I didn’t have the time). There are so much more tricks to make that code faster (that’s why I said it was rather poor). Your main slowdown is in the loop-overhead. AFAIK the JVM does some loop unrolling, but doing it yourself (copy the body 2-4 times inside the loop) is much faster.
Again, it’s hard to do micro-benchmarking, not only because the JVM rewrites most of your code, but also because your might think the bottleneck is somewhere else, or are even unaware of certain types of bottlenecks.
Last, your have that giant method that runs all the benchmarks. The JIT stops moptimising methods that are longer than a certain amount of instructions. That ‘magic amount’ is rather low, and your method will certainly not be turned into the most efficient native code. You can take a look at the output of the Debug VMs (release candidates with debug commandline options, IIRC) from Sun, which will tell you when the JIT gives up.
I’ll posts some tests later today.