Disabling floating point denormals

I’m working on a project where float denormals have a big impact on performance.
To clarify, float denormals are floating point numbers that are so close to 0 that its format isn’t well supported by the CPU anymore, leading to incredibly slow performance.
Any floating point calculations that tend to gradually go towards 0 are potentially impacted. In my case, that’s audio DSP stuff, but I can imagine that things like physics calculations are potentially affected too.
(For reference: https://randomascii.wordpress.com/2012/05/20/thats-not-normalthe-performance-of-odd-floats/ and also the javadoc of Float.MIN_NORMAL).

Currently, I either add a small offset or add a check to ‘nudge’ these values to 0 (the latter is surprisingly often faster than adding an offset), but that is both impractical in a lot of cases and has a performance impact in itself.

It seems it’s possible to disable float denormals on the CPU so that such numbers simply become 0 (the linked article touches upon this), so I’m thinking of creating a little dll and JNI library to do that in java.
I think it would help my project tremendously, and I guess it could be a nice exercise for me.

Now my question is: Is it actually possible, especially within the context of a JVM? I mean I’m quite out of the loop of native programming, so maybe I’m unaware of something that might make this a no-go?
Or maybe something like this already exists somewhere? (I’ve googled, but I couldn’t find anything myself).

You’d need a native method to set denormals-as-zero or flush to zero mode for the thread(s) in question. Call once and then not worry about it again.

Thanks for your reply! :slight_smile:
That was exactly my understanding of it, but I wasn’t sure if it would actually work in the context of a JVM.

It’s officially a no-no to muck with flags like this. It worked the last time I checked.

In what sense it is a no-no?
I mean I understand why floating point denormals exist and why they are a good thing, but I just want to change the behavior for my particular case.

Can’t you change the scale of your calculations?

Because all FP computations while these modes are active are ignoring denormals. That’s outside of the JVM’s spec. Additionally any routine that depends on the behavior of denormals will be effected. The CPU and OS take care of limiting the mode changes to the thread(s) in question.

That’s what I’m often doing now, but it is a workaround that comes at a cost.

Ok I see.
There’s just one thread where all these DSP calculations take place, but that’s also the same thread as the Asio driver’s, so to be safe I could enable the ‘flush-to-zero’ mode just before my DSP stuff takes place and re-enable the default ‘denormal’ mode afterwards.

I had some trouble getting it to work with MinGW (it seems it doesn’t support ‘_controlfp_s’?), so I’m trying my luck with VS.

Shouldn’t there be a library for this? It sounds like it’d be useful. Maybe someone feels obliged to make one? =P

If I could find one, I probably wouldn’t be nerding around with JNI and C compilers right now :wink:
So yes, if someone knows of a library that can do this, I’d happily use that!

EDIT:
In my benchmarks, hitting denormal floats means a performance degradation in the order of 25 times as slow as usual and up!
Obviously they are huge spikes that really hurt real-time applications.
Working around it by adding tests or adding an offset helps, but still degrades performance significantly in itself.
And those work-arounds need to be applied almost everywhere in my case, which not only makes everything noticeably slower, it’s also a big pain in the behind to have to litter your code with all that stuff almost everywhere.

I think both VC and GCC support the macros:

_MM_SET_FLUSH_ZERO_MODE(_MM_FLUSH_ZERO_ON);
_MM_SET_DENORMALS_ZERO_MODE(_MM_DENORMALS_ZERO_ON);

which wrap the intrinsics:

_mm_setcsr(xxx)
_mm_getcsr()

Thanks Roquen, that worked!

So the results are now like as follows, using this code

		float f = 1;

		for (int count = 0; count < 8; count++) {
			long start = System.nanoTime();
			for (int i = 10000000; i > 0; i--) {
				f *= 0.999998f;
			}
			System.out.println(count + ": " + (System.nanoTime() - start) + "   " + f + " \t" + (f < Float.MIN_NORMAL));
		}

Default behaviour:
0: 19782589 1.5903542E-9 false
1: 21886645 2.5291527E-18 false
2: 21601492 4.0223933E-27 false
3: 20395286 6.397437E-36 false
4: 384876680 3.45733E-40 true
5: 544109759 3.45733E-40 true
6: 544310570 3.45733E-40 true
7: 545460996 3.45733E-40 true

With flush-to-zero enabled:
0: 19982061 1.5903542E-9 false
1: 19439424 2.5291527E-18 false
2: 19738856 4.0223933E-27 false
3: 19458167 6.397437E-36 false
4: 19633095 0.0 true
5: 19759830 0.0 true
6: 19560804 0.0 true
7: 19418897 0.0 true

This will certainly make performance a lot more stable in my project, so I’m a happy camper :slight_smile:

EDIT: I just tested it with my DSP project, and it absolutely works.
Even though I already prevented denormals in the most obvious cases, I didn’t everywhere. As a result, where I get enormous performance spikes without enabling flush-to-zero, those spikes are all gone and generally performance is much better.
Result! :smiley:

For completeness. In other use cases (not DSP like) or unwilling to call native and/or muck with FP behavior flags:


public static final strictfp float flushDenormal(float x) { return (x+1.f)-1.f; }
public static final double flushDenormal(double x) { return (x+1.0)-1.0; }

Both return input ‘x’ unless x is a denormal or negative zero in which case they return zero. The strictfp on the single version is paranoia. It disallows improving the precision of the computation. Not needed in the double case. (EDIT: Actually there’s a range where a rounding will occur.)

Heh, clever :slight_smile:

Floating point never fails to surprise.
I mean, after learning that 1.0f + 1.0e-8f == 1.0f :expressionless: I basically stopped caring about denormal values having a purpose for correctness.

If anyone is interested, I’ve put the pre-compiled library here:
http://sourceforge.net/p/jmodsyn/code/HEAD/tree/trunk/LibAbnormal/build/
Sorry, windows only for now…

To enable flush-to-zero mode, call

org.modsyn.abnormal.Abnormal.setDenormals(false);

to restore normal behaviour, call

org.modsyn.abnormal.Abnormal.setDenormals(true);

The source code is there as well (as much as there is any); feel free to use it any way you like. If it’s useful enough, maybe this should become a proper cross-platform library.

For a general note. x+k==x occurs everywhere with non zero x and k, so it isn’t a denormal thing.

Yes I know; that was just a remark about floating point in general.

Making it obvious for everyone.