Thanks for everyone’s comments so far, they have helped a bit, as well as crossing ref-ing with a few outside sources and the JavaOne papers/presentations.
I will post our new math utils as soon as I have one that I believe is WORTH using…
Any attempt at a faster square root or inverse square root, float or double, has failed. The functions are correct (but approximates) and speedy. However, the fastest techniques use bit twisting of the floating point format, which I have implemented, unfortunately in Java this requires a method call, the float(double) to int(long) conversion calls, where in C this is a cast. Now that call surely does little more than a cast and a copy, but it is enough to make the gains not pan out. Even the Quake3 inverse sqrt (used for vector normalizing among other things) is still slower than 1.4.2 java.lang.sqrt, I suspect simply because of the function handling in Java cost.
I did manage to get some gains on Sin and Cos, using fast functions and look-up tables. With my ~sin/cos look up functions I get about ~3.15 times fast sin/cos when using degrees because the table is degree and the java.lang.math tests had a toRadians in there (this is somewhat fair since many times, we work in degrees and have to convert at run-time) In the no toRadians version, i.e. straight Math.sin(angle) to FastMath.sin(angle) test, I get > 60% speed up. However this is on VERY focused tests. When placed in a more realistic test such as Matrix.rot(x) we get around 2.05 times speed up. Anyway, when I have a nice comparison chart I’ll post the results in the source.
On the up side, against StrictMath, this stuff rocks. Up 10 times faster in some cases. So if you are using a older VM, or are in a different execution mode, such as -Xcompile or interpreted, you get great gains.
Of course the really good news is that Java math is now super-fast in the Java world. I imagine that java.lang.math will be the fastest possible way to do floating work that isn’t batched and sent to the native-side. And that is really only practical in very specialized cases. I have a few, anyone have any other? I would like to compile a list of these specialized cases and try to find the break better than even sizes.