Disabling floating point denormals

Spasi · August 1, 2015, 9:56pm

[quote=“erikd,post:17,topic:55196”]
Thanks, I think I’ll add this to LWJGL 3.

erikd · August 1, 2015, 10:17pm

That would be great!

Spasi · August 2, 2015, 11:28am

The latest LWJGL build (3.0.0b #9) includes bindings to the SSE control register macros. Example code:

import static org.lwjgl.system.simd.SSE.*;
import static org.lwjgl.system.simd.SSE3.*;

// ...

_MM_SET_EXCEPTION_STATE(0);
// ...fp math here...
if ( (_MM_GET_EXCEPTION_STATE() & _MM_EXCEPT_DENORM) != 0 ) {
	// above code underflows to denormal fp
}

// enable flush-to-zero and denormals-are-zero modes
_MM_SET_FLUSH_ZERO_MODE(_MM_FLUSH_ZERO_ON);
_MM_SET_DENORMALS_ZERO_MODE(_MM_DENORMALS_ZERO_ON);

erikd · August 2, 2015, 8:27pm

Nice
Does it already work on Linux & Mac too?

Spasi · August 2, 2015, 8:29pm

It works on Linux. It compiled on Mac (on Travis CI) but I haven’t tested yet.

theagentd · August 2, 2015, 8:32pm

Can you clarify a little what this is useful for?

Spasi · August 2, 2015, 8:37pm

[quote=“theagentd,post:26,topic:55196”]
erikd explained it in his first post. Basically you can get hit by really bad performance if your fp math code encounters too many denormals. In his case it’s quite common, but it might not be affecting other code at all. You can use this snippet to test for it:

_MM_SET_EXCEPTION_STATE(0);
// ...fp math here...
if ( (_MM_GET_EXCEPTION_STATE() & _MM_EXCEPT_DENORM) != 0 ) {
   // above code underflows to denormal fp
}

erikd · August 2, 2015, 9:17pm

I suppose for games it’s normally a non-issue. Maybe in some cases where you for example calculate physics where ‘ground-level’ is usually at 0 then denormals can hit you, but I suppose that’s a corner case (and easy to work around).
But as soon as you start doing real-time audio processing in your games, being able to enable ‘flush-to-zero’ mode is really nice to have.

Spasi · August 2, 2015, 9:17pm

and the test failed. On Linux, it crashed (on a very different JVM version though). Searched a bit and the problem ended up being [icode]-Xcheck:jni[/icode] used by the test suite. Without it, the test passes on all OSes. A good description of why this happens can be found here.

erikd · August 2, 2015, 9:30pm

That’s interesting.
What I understand of it is that the JVM actively discourages changing such flags, is that correct?

Spasi · August 2, 2015, 10:30pm

Yes, you’re allowed to change MXCSR inside a JNI call, but you have to change it back before returning. Enabling FTZ and DAZ like above is against the JVM specs and likely to cause issues. But I don’t think it’s really that dangerous. The register has thread scope, it won’t affect anything other than the thread in which you enable FTZ/DAZ.

erikd · August 2, 2015, 10:53pm

But the whole point is that it stays enabled when you return, right?
So just for my (perhaps slow) understanding, does this mean you might have to set some non-standard flags on the JVM to stop it from enforcing default settings there? Or is it just a testing issue?

Spasi · August 3, 2015, 6:13am

[quote=“erikd,post:32,topic:55196”]
Yes, the point is to have it stay enabled so that it works with the fp code the JVM runs, not just inside the JNI call.

[quote=“erikd,post:32,topic:55196”]
No, you don’t have to set a flag, there’s no such flag anyway. The JVM, at some point around Java 5, was restoring MXCSR with the value it expects after every single JNI call. Unfortunately changing MXCSR is a very expensive, serializing operation and it totally killed performance, so the JVM doesn’t do it anymore. You can enable that behavior with [icode]-XX:+RestoreMXCSROnJNICall[/icode]. Afaict, this flag resets MXCSR without even checking if it changed. Using [icode]-Xcheck:jni[/icode] on the other hand does in fact check if the value changed, prints a warning, then resets MXCSR (or crashes on Linux :)). This is what was happening with the LWJGL tests.

So, indeed it’s a testing issue. But people should be aware that they’re going behind the JVM’s back with this trick and only use it if absolutely necessary.

Roquen · August 6, 2015, 8:58am

Oh and I keep forgetting. These modes only effect operations in SIMD registers. Any ops which HotStop ends up using x87 will not be effected. I don’t expect there to be much usage of x87, but I haven’t examined this in a long time.

Also, denormals can occur quite frequently outside of DSP like functionality. In games the mostly likely candidate is simulation.

Spasi · August 6, 2015, 2:18pm

[quote=“Roquen,post:34,topic:55196”]
Did a bit of testing, x87 is only used on the x86 JVM with [icode]-XX:UseSSE=0[/icode]. The x64 JVM always uses SSE.

theagentd · August 6, 2015, 8:20pm

Wait, the JVM automatically uses SSE? In what situations?

KaiHH · August 6, 2015, 8:40pm

It uses scalar SSE instructions for all floating point arithmetics. SSE does not automatically mean SIMD.

Roquen · August 6, 2015, 8:45pm

I should have mentioned that the places where it might be using x87 is when it has ops that sse dose not…notably for intrinsic methods like Math.whatever forward and inverse trig as specific examples.

The JVM has been using scalar sse ops for a very long time now . A few months ago someone (IBM maybe) contributed some code for basic autovectorization. It is not in release builds yet.