[JOODE] Contribution

Not according to my benchmarks. 1f/Math.sqrt(…); is multiple times slower than invSqrt. It has also resulted in a substantial FPS increase with CPU based skeletal character animation.

DP

Results:

1/Math.sqrt(…); is nearly twice as slow as invSqrt. Thats definetly not something to be laughed at since sqrt is one of the more expensive math operations on the CPU (along with trig).

Benchmark:


public class TestInvSqrt {

	public static void main( String args[] ) {
		// warm up loops
		int loops = 10000000;
		for ( int i = 0; i < loops; i++ ) {
			float v1 = 1 / (float)Math.sqrt( i );
			float v2 = invSqrt( i );

			float v3 = v2 * v1;
		}

		// do the proper loops now
		long nano = System.nanoTime();
		for ( int i = 0; i < loops; i++ ) {
			float v1 = 1f / (float)Math.sqrt( i );
			// to prevent dead code removal
			v1 *= 2;
		}
		long after = System.nanoTime();
		System.out.println( "Math.sqrt: " + (double)( after - nano ) / (double)loops );

		nano = System.nanoTime();
		for ( int i = 0; i < loops; i++ ) {
			float v2 = invSqrt( i );
			// to prevent dead code removal;
			v2 *= 2;
		}
		after = System.nanoTime();
		System.out.println( "invsqrt: " + (double)( after - nano ) / (double)loops );
	}

	private static float invSqrt( float value ) {
		float xhalf = 0.5f * value;
		int i = Float.floatToIntBits( value );
		i = 0x5f3759df - ( i >> 1 );
		// i = 0x5f375a86 - ( i >> 1 );
		value = Float.intBitsToFloat( i );
		value = value * ( 1.5f - xhalf * value * value );
		return value;
	}

}

Edit: Clarified benchmark

DP

Ok, it checks out for pure speed. But two hurdles remain:

  1. Correctness of computed values
  2. Portability to other platforms (and correctness there too)

Also, if it is decided that JOODE should use javax.vecmath throughout, we would rely on the vecmath implementation for vector length (Math.sqrt()):

https://vecmath.dev.java.net/source/browse/vecmath/src/javax/vecmath/Vector3f.java?rev=1.3&view=auto&content-type=text/vnd.viewcvs-markup

Because, as referenced here: http://www.java-gaming.org/forums/index.php?topic=15677.msg125477#msg125477 the biggest slow-down that JOODE currently experiences is in the implementation of the Real class, and its derivatives.

Well, we are using (a slight modification of) Kenji Hiranabe’s vecmath implementation in Xith3D. The difference to Sun’s vecmath is that it is as GC-cheap as possible, though not thread safe. And we can modify it if needed. If you would use this lib, too, there wouldn’t be a problem to make use of the above optimization.

Marvin

It is accurate to 4 decimal places. Iterations over the last line in the algorithm produces more accurate results. If you look at the optimisation, one line has been commented out as I have replaced the initial guess with one that yields more accurate results.

You do realise this is java right?

This is a contribution, take it or leave it, its up to you.

DP

I agree both with Marvin and darkprophet.

I’d find cool if you, biggeruniverse, would use the same version of vecmath (Kenji Hiranabe modified) as us (in Xith3D) so that we can add these optimisations (if you don’t want to change regular sin()/cos()/sqrt() methods we could have fastXXX() ones).

Well, what accuracy is JOODE guaranteeing to the user? Is accuracy v. speed configurable? At what point of accuracy is this slower than Math.sqrt?

JOODE is Java, that code is converted C code. It was written to be close to the hardware, which of course makes me suspicious of it. Has this Java code been shown to work on any non-x86 environment? (It should of course, but better to test these things before they are relied upon to be always correct)

I hope we take it, as it is faster, but first it has to be proven out.

Well, first thing first, let me finish the conversion to pure vecmath. I’ll branch it for easy testing, and then it can be decided what to do about optimizations.

non-x86 platform? So that implies PPC right? Which means older macs and PS3.

If you read above, the Accuracy VS Speed is configurable depending on the number of loops on the last line. As for the accuracy of JOODE, your depending on LCP in the first place, which isn’t highly accurate, but achieves visually convincing results and is stable. If you want accuracy, im afraid your going to have to change all of JOODE.

If you want to re-prove what Quake3 and I (and countless others) already have proved that it is a worthwhile optimisation, then by all means go ahead :slight_smile:

DP

myeah, fairly convincing argument. 4dp is pretty nice. Probably should go for it after a good test

@biggeruniverse
your PM inbox is full! I don’t know your sourceforge UNIX username

o noes! Well, it’s biggeruniverse. I know, I ought to be more creative…

OK your now a developer biggeruniverse

Try SPARC, anything MIPS, or Cell.

*Is JOODE accuracy v. speed configurable

Did they prove it in Java?

If you look at the change log, you will notice that this code is already in place (with a different initial value). I put it there myself (well, with Amos’ help). However, I left it commented out, since there is currently no way to choose, nor was there a concensus on its use.

Sorry, only machine I have and is accessible to me is a AMD Athlon X2. I doubt anybody with a SPARC/MIPS will run JOODE tho given that SPARCS or MIPS aren’t exactly gaming machines :slight_smile:

In any case, what basis do you have for the results being different other than “just in case”?

1: Java gurantees that the code runs the same on all platforms and as swpalmer noted, its all IEEE 754…
2: In that paper you referenced in the comments, it clearly states it works on common hardware.

I think you are not giving me the real reason why you dont want this included and frankly, I dont care anymore. Like I said before, its a contribution, the polite thing to do say thanks for the contribution, encourage more people to contribute and do what the hell you please with the code thats given. I personally doubt I will be contributing to JOODE again given the poor spirit of the project.

DP

Don’t take me as any indication of the larger project community. It is still a very young project, and has no general “spirit” yet. Tom has already stated he’s happy with the contrib, and the thread is becoming an off-topic about Java portability when using “magic” algorithms that rely on specific standards and/or architectures.

I don’t like to simply say “Thanks for the contribution, move along.” I prefer a public discussion of the patch, what the submitter has done and why, as per the spirit and guidelines of the project. It leads to a better quality of patches. I’m not saying it has to be a hearing before a committee either, but it does not justice to the patch writer to simply take their patch and trash it.

I hope that you will contribute in the future, and I hope you will at least consider JOODE for volatile.

This algorithm is actually covered in a book of mine (3D Game Engine Architecture) and is part of the Wild Magic 3d Engine . It seems to be a quote common “hack” in 3D engines, so I believe it’s save. It is described here and here. The documentation section on http://www.geometrictools.com has a lot of probably useful stuff…

Why not let the user configure that ?
Just a “Math Optimizations” Enable/Disable options.

biggeruniverse, see the beginning of this thread. It has been reported to work on a 64bit processor.

See DP benchmark. Or were you talking about non-x86 platform compatibility ?

OK, no problem.
Anyway, change from sun vecmath to our modified vecmath is pretty straight-forward.

That’s what I was getting at.

cool. I plan to commit the branch (mostly complete) this evening.

I don’t think the hostility in your message is warranted. I see no hidden motives for rejecting the code, only straight-forward, honest inquiry as to whether it works in all Java environments, where the JOODE project as a whole is drawing the line in terms of trade-off vs. speed, and how this fits in with those decisions.

I also don’t think dismissing SPARC/MIPS is in the “spirit” of Java. JOODE may be developed primarily for game physics, but there is no reason to limit it to gaming unless there is a significant gain.

I agree with the idea of a on/off switch for faster less-accurate math routines, but I wonder if adding such an option will impose a performance penalty of its own? I would hope that if the math code is called through an interface that HotSpot will inline the virtual calls since there will likely only be one implementation in use anyway.

Yeah just what I thought… but are calls via an interface inlined ? (I have very limited knowlege of the JVM internals)