Disabling floating point denormals

erikd · July 30, 2015, 6:39pm

I’m working on a project where float denormals have a big impact on performance.
To clarify, float denormals are floating point numbers that are so close to 0 that its format isn’t well supported by the CPU anymore, leading to incredibly slow performance.
Any floating point calculations that tend to gradually go towards 0 are potentially impacted. In my case, that’s audio DSP stuff, but I can imagine that things like physics calculations are potentially affected too.
(For reference: https://randomascii.wordpress.com/2012/05/20/thats-not-normalthe-performance-of-odd-floats/ and also the javadoc of Float.MIN_NORMAL).

Currently, I either add a small offset or add a check to ‘nudge’ these values to 0 (the latter is surprisingly often faster than adding an offset), but that is both impractical in a lot of cases and has a performance impact in itself.

It seems it’s possible to disable float denormals on the CPU so that such numbers simply become 0 (the linked article touches upon this), so I’m thinking of creating a little dll and JNI library to do that in java.
I think it would help my project tremendously, and I guess it could be a nice exercise for me.

Now my question is: Is it actually possible, especially within the context of a JVM? I mean I’m quite out of the loop of native programming, so maybe I’m unaware of something that might make this a no-go?
Or maybe something like this already exists somewhere? (I’ve googled, but I couldn’t find anything myself).

Roquen · July 30, 2015, 6:47pm

You’d need a native method to set denormals-as-zero or flush to zero mode for the thread(s) in question. Call once and then not worry about it again.

erikd · July 30, 2015, 6:54pm

Thanks for your reply!
That was exactly my understanding of it, but I wasn’t sure if it would actually work in the context of a JVM.

Roquen · July 30, 2015, 8:11pm

It’s officially a no-no to muck with flags like this. It worked the last time I checked.

erikd · July 30, 2015, 8:15pm

In what sense it is a no-no?
I mean I understand why floating point denormals exist and why they are a good thing, but I just want to change the behavior for my particular case.

ags1 · July 30, 2015, 8:16pm

Can’t you change the scale of your calculations?

Roquen · July 30, 2015, 8:29pm

Because all FP computations while these modes are active are ignoring denormals. That’s outside of the JVM’s spec. Additionally any routine that depends on the behavior of denormals will be effected. The CPU and OS take care of limiting the mode changes to the thread(s) in question.

erikd · July 31, 2015, 9:02am

That’s what I’m often doing now, but it is a workaround that comes at a cost.

erikd · July 31, 2015, 4:32pm

Ok I see.
There’s just one thread where all these DSP calculations take place, but that’s also the same thread as the Asio driver’s, so to be safe I could enable the ‘flush-to-zero’ mode just before my DSP stuff takes place and re-enable the default ‘denormal’ mode afterwards.

I had some trouble getting it to work with MinGW (it seems it doesn’t support ‘_controlfp_s’?), so I’m trying my luck with VS.

theagentd · July 31, 2015, 4:51pm

Shouldn’t there be a library for this? It sounds like it’d be useful. Maybe someone feels obliged to make one? =P

erikd · July 31, 2015, 5:46pm

If I could find one, I probably wouldn’t be nerding around with JNI and C compilers right now
So yes, if someone knows of a library that can do this, I’d happily use that!

EDIT:
In my benchmarks, hitting denormal floats means a performance degradation in the order of 25 times as slow as usual and up!
Obviously they are huge spikes that really hurt real-time applications.
Working around it by adding tests or adding an offset helps, but still degrades performance significantly in itself.
And those work-arounds need to be applied almost everywhere in my case, which not only makes everything noticeably slower, it’s also a big pain in the behind to have to litter your code with all that stuff almost everywhere.

Roquen · July 31, 2015, 7:02pm

I think both VC and GCC support the macros:

_MM_SET_FLUSH_ZERO_MODE(_MM_FLUSH_ZERO_ON);
_MM_SET_DENORMALS_ZERO_MODE(_MM_DENORMALS_ZERO_ON);

which wrap the intrinsics:

_mm_setcsr(xxx)
_mm_getcsr()

erikd · August 1, 2015, 7:39am

Thanks Roquen, that worked!

erikd · August 1, 2015, 8:04am

So the results are now like as follows, using this code

		float f = 1;

		for (int count = 0; count < 8; count++) {
			long start = System.nanoTime();
			for (int i = 10000000; i > 0; i--) {
				f *= 0.999998f;
			}
			System.out.println(count + ": " + (System.nanoTime() - start) + "   " + f + " \t" + (f < Float.MIN_NORMAL));
		}

Default behaviour:
0: 19782589 1.5903542E-9 false
1: 21886645 2.5291527E-18 false
2: 21601492 4.0223933E-27 false
3: 20395286 6.397437E-36 false
4: 384876680 3.45733E-40 true
5: 544109759 3.45733E-40 true
6: 544310570 3.45733E-40 true
7: 545460996 3.45733E-40 true

With flush-to-zero enabled:
0: 19982061 1.5903542E-9 false
1: 19439424 2.5291527E-18 false
2: 19738856 4.0223933E-27 false
3: 19458167 6.397437E-36 false
4: 19633095 0.0 true
5: 19759830 0.0 true
6: 19560804 0.0 true
7: 19418897 0.0 true

This will certainly make performance a lot more stable in my project, so I’m a happy camper

EDIT: I just tested it with my DSP project, and it absolutely works.
Even though I already prevented denormals in the most obvious cases, I didn’t everywhere. As a result, where I get enormous performance spikes without enabling flush-to-zero, those spikes are all gone and generally performance is much better.
Result!

Roquen · August 1, 2015, 9:20am

For completeness. In other use cases (not DSP like) or unwilling to call native and/or muck with FP behavior flags:


public static final strictfp float flushDenormal(float x) { return (x+1.f)-1.f; }
public static final double flushDenormal(double x) { return (x+1.0)-1.0; }

Both return input ‘x’ unless x is a denormal or negative zero in which case they return zero. The strictfp on the single version is paranoia. It disallows improving the precision of the computation. Not needed in the double case. (EDIT: Actually there’s a range where a rounding will occur.)

erikd · August 1, 2015, 9:41am

Heh, clever

Floating point never fails to surprise.
I mean, after learning that 1.0f + 1.0e-8f == 1.0f I basically stopped caring about denormal values having a purpose for correctness.

erikd · August 1, 2015, 9:39pm

If anyone is interested, I’ve put the pre-compiled library here:
http://sourceforge.net/p/jmodsyn/code/HEAD/tree/trunk/LibAbnormal/build/
Sorry, windows only for now…

To enable flush-to-zero mode, call

org.modsyn.abnormal.Abnormal.setDenormals(false);

to restore normal behaviour, call

org.modsyn.abnormal.Abnormal.setDenormals(true);

The source code is there as well (as much as there is any); feel free to use it any way you like. If it’s useful enough, maybe this should become a proper cross-platform library.

Roquen · August 1, 2015, 9:53pm

For a general note. x+k==x occurs everywhere with non zero x and k, so it isn’t a denormal thing.

erikd · August 1, 2015, 9:55pm

Yes I know; that was just a remark about floating point in general.

Roquen · August 1, 2015, 9:56pm

Making it obvious for everyone.