Disabling floating point denormals

[quote=“erikd,post:17,topic:55196”]
Thanks, I think I’ll add this to LWJGL 3.

That would be great!

The latest LWJGL build (3.0.0b #9) includes bindings to the SSE control register macros. Example code:

import static org.lwjgl.system.simd.SSE.*;
import static org.lwjgl.system.simd.SSE3.*;

// ...

_MM_SET_EXCEPTION_STATE(0);
// ...fp math here...
if ( (_MM_GET_EXCEPTION_STATE() & _MM_EXCEPT_DENORM) != 0 ) {
	// above code underflows to denormal fp
}

// enable flush-to-zero and denormals-are-zero modes
_MM_SET_FLUSH_ZERO_MODE(_MM_FLUSH_ZERO_ON);
_MM_SET_DENORMALS_ZERO_MODE(_MM_DENORMALS_ZERO_ON);

Nice :slight_smile:
Does it already work on Linux & Mac too?

It works on Linux. It compiled on Mac (on Travis CI) but I haven’t tested yet.

Can you clarify a little what this is useful for?

[quote=“theagentd,post:26,topic:55196”]
erikd explained it in his first post. Basically you can get hit by really bad performance if your fp math code encounters too many denormals. In his case it’s quite common, but it might not be affecting other code at all. You can use this snippet to test for it:

_MM_SET_EXCEPTION_STATE(0);
// ...fp math here...
if ( (_MM_GET_EXCEPTION_STATE() & _MM_EXCEPT_DENORM) != 0 ) {
   // above code underflows to denormal fp
}

I suppose for games it’s normally a non-issue. Maybe in some cases where you for example calculate physics where ‘ground-level’ is usually at 0 then denormals can hit you, but I suppose that’s a corner case (and easy to work around).
But as soon as you start doing real-time audio processing in your games, being able to enable ‘flush-to-zero’ mode is really nice to have.

and the test failed. On Linux, it crashed (on a very different JVM version though). Searched a bit and the problem ended up being [icode]-Xcheck:jni[/icode] used by the test suite. Without it, the test passes on all OSes. A good description of why this happens can be found here.

That’s interesting.
What I understand of it is that the JVM actively discourages changing such flags, is that correct?

Yes, you’re allowed to change MXCSR inside a JNI call, but you have to change it back before returning. Enabling FTZ and DAZ like above is against the JVM specs and likely to cause issues. But I don’t think it’s really that dangerous. The register has thread scope, it won’t affect anything other than the thread in which you enable FTZ/DAZ.

But the whole point is that it stays enabled when you return, right?
So just for my (perhaps slow) understanding, does this mean you might have to set some non-standard flags on the JVM to stop it from enforcing default settings there? Or is it just a testing issue?

[quote=“erikd,post:32,topic:55196”]
Yes, the point is to have it stay enabled so that it works with the fp code the JVM runs, not just inside the JNI call.

[quote=“erikd,post:32,topic:55196”]
No, you don’t have to set a flag, there’s no such flag anyway. The JVM, at some point around Java 5, was restoring MXCSR with the value it expects after every single JNI call. Unfortunately changing MXCSR is a very expensive, serializing operation and it totally killed performance, so the JVM doesn’t do it anymore. You can enable that behavior with [icode]-XX:+RestoreMXCSROnJNICall[/icode]. Afaict, this flag resets MXCSR without even checking if it changed. Using [icode]-Xcheck:jni[/icode] on the other hand does in fact check if the value changed, prints a warning, then resets MXCSR (or crashes on Linux :)). This is what was happening with the LWJGL tests.

So, indeed it’s a testing issue. But people should be aware that they’re going behind the JVM’s back with this trick and only use it if absolutely necessary.

Oh and I keep forgetting. These modes only effect operations in SIMD registers. Any ops which HotStop ends up using x87 will not be effected. I don’t expect there to be much usage of x87, but I haven’t examined this in a long time.

Also, denormals can occur quite frequently outside of DSP like functionality. In games the mostly likely candidate is simulation.

[quote=“Roquen,post:34,topic:55196”]
Did a bit of testing, x87 is only used on the x86 JVM with [icode]-XX:UseSSE=0[/icode]. The x64 JVM always uses SSE.

Wait, the JVM automatically uses SSE? In what situations?

It uses scalar SSE instructions for all floating point arithmetics. SSE does not automatically mean SIMD.

I should have mentioned that the places where it might be using x87 is when it has ops that sse dose not…notably for intrinsic methods like Math.whatever forward and inverse trig as specific examples.

The JVM has been using scalar sse ops for a very long time now . A few months ago someone (IBM maybe) contributed some code for basic autovectorization. It is not in release builds yet.