Who got what idea? I don’t see anything in this thread even remotely related to rainbow tables.
I didn’t know that. It isn’t flagged as native so I never expected the JVM would generate exceptions for those.
I did some quick tests, and indeed the trig functions are in the order of 80 times faster when calling Math. :cranky:
On the other hand, some functions like min/max are 35% faster when calling StrictMath (I haven’t tested others yet).
Edit: A more advanced test (using my second test type, and preventing code from being optimized away) shows now that StrictMath is about 10% faster than Math. ???
Edit2: (StrictMath is …)
sin,cos: 50% slower
tan: 10% faster
atan: about the same
sqrt: 9000% slower
cbrt: about the same
floor: about 10% slower
ceil: about the same
max,min: about the same
And I think we’re drifting too much off topic. shouldn’t someone make a new topic for this?
Or we could stop repeating the same thing over and over…and simply use Riven’s methods for fast sin/cos/tan/atan (@Riven do you have fast implementations for asin and acos?)
Always seems to bother me a bit that Math.sqrt is such a super slow operation, a quick google search seems to show that there is indeed a fast integer square root using similar techniques to the above here. Should make a nice addition to the collection of fast hacky maths methods.
The issue is that optimal approximation is dependent on alot of factors and most people want one version of each function that “just works”. That’s simply not the way to maximize performance. As I’ve mentioned before I’ve never seen a table-based method which out-performs (on non-embedded systems) numeric methods. But that’s “my” use-case YMMV. If you really want fast, you have to know your input range and what kind (abs or rel error) and the bound on that error and potentially access patterns.
WRT: trig & inverse-trig. Remember what the man said: “The best optimization is to not need to call anything.” If you’re calling inverse trig functions, my question is: “do you really want the angle? I mean really, really want the angle”. For example if you’re storing your objects orientations as angles…therein lies a problem. Another is if you’re converting to polar, manipulating the angle then converting back.
WRT: Sqrt. All roots are a pain as the terms of the Talyor series decrease very slowing, so they are very hard to approximate with a polynomial or rational of polynomials. If you almost know the answer, then they are very easy (like Newton’s method).
Fast/slow. It’s all relative. To put some numbers on it in terms of instruction latency/throughput (both in cycles) for modern(ish) Intel. Ranges are to cover a reasonable set of hardware.
SQRTSD: 20-39/14-39 (one complaint double sqrt) On unspecified input ranges these can drop to ~6/~10.
SQRTSS: 23-32/23-32 (one compliant single sqrt) On unspecified input ranges these can drop to ~6/~10.
RSQRTSS: 3-6/1-4 (approximate 1/sqrt for singles. We can’t do this…big fat sadface)
So what I intended to say about sqrt being “hard” is that unlike lots of other functions reducing the range or precision to get a faster version is much more difficult. About the only way you can succeed is if you know that it’s very near a specific value (like when renormalizing). That old SGI trick (popularized by ID) should always be slower that simply using the hardware.
After some googling I found out these are MMX instructions, and Java can’t use those afaik. Normal math happens using FPU instructions. I copied this from “Intel Architecture Optimization Reference Manual”:
“Additionally, while the multi-cycle floating-point instructions, fdiv and fsqrt, execute in the floating-point unit pipe, integer instructions can be executed in parallel.”
So it pays off to do lots of non-fp-math in inner-loops that use Math.sqrt and /. Separating (or keeping) those in different inner-loops would be bad.
And from “How to optimize for the Pentium family of microprocessors” I read that FSQRT uses 70 cycles and cannot overlap integer multiplication instructions (they will block the pipeline). No idea about modern processors however. They should be much more efficient in this, especially the new Core Ix and Celeron Sandy Bridge CPUs.
I’m pretty sure that HotSpot does not use RSQRTSS, but even if it does it won’t approach that speed as it would have to perform some newton steps to increase the accuracy (since in java you can’t relax things like this.) It would need to do one for single and two for double (don’t quote me on this.)
HotSpot does use SSE instructions for a fair amount of FP computation…it simply uses the SISD version (*SS, *SD). All basic operations are performed in SSE.