Java OpenGL Math Library (JOML)

KaiHH · August 16, 2015, 11:43pm

What are those missing optimizations, then? ???
And what do you mean by “conversion” or “set-up” time? ???
(dragging information out of your nose again )

Roquen · August 17, 2015, 12:01am

The id paper is doing two forward trig…one is enough. It’ll be in my derivation when I complete it (the info is in recent posts). Conversion and/or set-up time is the application’s problem…you don’t have to worry about it.

(EDIT: The split the id version is using is odd as well. It’s so small it’d be better to remove it and add a small number to prevent the NaN)

theagentd · August 17, 2015, 1:09am

Looks like I missed all the fun.

I’m on vacation, and the laptop I was borrowing had its charger block cable bit off by a rabbit. If I find a new one tomorrow I’ll see what I can do. What exactly are you interested in? Timings of different slerp implementations?

This is true, and most of the time my animations would hit the fast nlerp path of LibGDX’s implementation. However, there is one big exception: animation transitions. WSW supports smoothly interpolating between two arbitrary animation frames. We use that to ensure smooth transitions between idle and walk animations, and during those smooth constant velocity motion is preferable.

I think Roquen was describing the same thing, but it should be possible to implement a binary search nlerp that calculates the midpoint between the two quaternions using nlerp repeatedly until the angle is small enough to produce accurate results with nlerp. Even a 179 degree slerp should be solvable with 5 passes, resulting in a ~12 degree lerp at the end, each iteration only requiring 4 additions and a normalization. I will try that tomorrow.

BurntPizza · August 17, 2015, 1:13am

The angles being slerp’d, I believe. Experimental evidence for ‘most angles are small.’

theagentd · August 17, 2015, 1:20am

Ah, right. Sure, I can do that.

Roquen · August 17, 2015, 3:06am

Yeah the angles, but evidence that small angles are naturally occurring to be more precise. I’ll note that significantly increasing the data set of animation data isn’t very important since the data’s going to move into cache at each processing step.

Also as mention earlier you could just integrate instead (if fixed time step & constant av per data point)

q = dq * q

but attempt to extend to second order isn’t worthwhile. Updating q as well dq when moving to the next data point is a good idea (compounding errors on q)

“However, there is one big exception:” This is what I was calling pose blending much earlier, and yes you can divide and conquer that problem.

Another option is to use a hybrid method (q0=start, q1=end):

lerp(q0, q1, f(t))

or:

result = s0(t)q0 + s1(t)q1.

The angular error between slerp and lerp for all angle rapid approaches a constant shaped function (just varies in scale).

Roquen · August 17, 2015, 3:16am

I can get why people get annoyed when I’m terse…but in addition to the points I made above about the reader’s level of interest: There’s always too fucking much to be said.

The most important thing about all of this kind of nonsense boils down to how much you care about being able to match your desired angular velocity.

theagentd · August 18, 2015, 2:56am

Histogram raw data: http://pastebin.com/R05EY7kn
Still no laptop. SkyAphid was nice enough to generate it. I’d make a graph, but Apple.

Edit: To clarify:
=

EDIT2:

https://docs.google.com/spreadsheets/d/1yVAtw5afJfjWDRGMgDg3GTf3HWbGl0TXX1mQOzbZH-g/pubchart?oid=2084561777&format=image

Roquen · August 18, 2015, 9:16am

A slight cleaning pass on the raw data. Folded negative and positive (they represent the same bin) and reversed order to show from the smallest angle to largest.

http://www.java-gaming.org/?action=pastebin&id=1330

Forward cumulative as rates: http://www.java-gaming.org/?action=pastebin&id=133

the first three values are: 0.48974302, 0.77403533, 0.8223283. This means ~49% are in first bin, ~77.4 are in first two, ~82.2 in the first three, etc.

theagentd · August 18, 2015, 10:55am

Huuuhhhhh… is that how it works??? Could you explain why? Not using proofs and weird mathematical annotation, but intuition?

The reason why it confuses me is that the dot product returns the cosine of the angle, between the quaternions, right? In that case, if dot=0, then the quaternions are “perpendicular” to each other, there’s a 90 degree angle between them. Going further to dot<0 would just mean that the angle increases even more, with dot=-1 the orientations pointing away from each other? How could those two be equal?

Googled a bit. So if I negate all quaternion components I end up with the same rotation as before. Does that not mean that the nlerp of JOML is flawed as it most certainly will take the “long” way around if one of the arguments is negated like that?

theagentd · August 18, 2015, 11:19am

I did some accuracy tests of different slerp() implementations. All tests run at double precision.

q1 and q2 generated as


			q0.set(r.nextDouble()*2-1, r.nextDouble()*2-1, r.nextDouble()*2-1, r.nextDouble()*2-1).normalize();
			q1.set(r.nextDouble()*2-1, r.nextDouble()*2-1, r.nextDouble()*2-1, r.nextDouble()*2-1).normalize();

Reference implementation: recursive nlerp with fixed 128 iterations. Should be numerically correct. Way more precision than doubles can handle.

JOML: latest implementation from GitHub.
LibGDX: LibGDX (possibly outdated implementation) converted to use doubles.
nlerp: fixed(?) nlerp from JOML (note the “t1=” line).
Recursive nlerp: while(abs(dot) > 0.5) replace q1 or q2 with nlerp(q1, q2, 0.5) depending on alpha and update alpha. Finally nlerp(q1, q2, alpha).

Error is calculating as toDegrees(acos(abs(referenceSlerpResult.dot(algorithmSlerpResult)))), e.g. how many degrees the slerp result differs from the reference slerp implementation’s result.

[tr][td]Implementation[/td][td]Average error (degrees)[/td][td]Maximum error (degrees)[/td][td]Note[/td][/tr]
[tr][td]JOML[/td][td]5.190123758213539E-7[/td][td]3.1945284701301985E-6[/td][td]Best accuracy[/td][/tr]
[tr][td]GDX[/td][td]0.3163254591936567[/td][td]12.920452042561234[/td][td]High dot threshold, doesn’t normalize when using nlerp, hence inflated error[/td][/tr]
[tr][td]nlerp[/td][td]1.1061119458909845[/td][td]4.074459794082495[/td][td]Surprisingly low error[/td][/tr]
[tr][td]Recursive nlerp[/td][td]0.2398828107909901[/td][td]1.1169005730338715[/td][td]Average lerps: 1.6144019[/td][/tr]

KaiHH · August 18, 2015, 11:47am

Yes, I guess so. It did not swap the sign of the dot product. I fixed the nlerp implementation by taking the implementation from the article http://fabiensanglard.net/doom3_documentation/37725-293747_293747.pdf mentioned by @Spasi.

Roquen · August 18, 2015, 11:50am

As implemented (not the math):
slerp( a,b) = slerp( a, -b) = r
slerp(-a,b) = slerp(-a,-b) = -r

the results r and -r represent the same rotation.

dot(a,b) = dot(-a,-b) = d
dot(a,-b) = dot(-a,b) = -d

if the dot product is negative you’re really asking for the largest angle about the implied between the two input, but implementation swap it the to smallest automagically.

Roquen · August 18, 2015, 12:45pm

Note the error is 3D is going to be 2*acos(abs(dot(ref,approx))).

Let me try to answer your question in a different way. Most slerps do this (implicitly):

slerp(a,b,t) { d=dot(a, b); if (d<0) {d=-d; b=-b;} XXXX }

We have input a & b and assuming b != a && b != -a, then there is a plane that contains a,b and the origin. We define ‘a’ to be our reference direction so we turn the plane so it points straight to the right (becomes the X axis). We now define the Y axis to be the direction orthogonal to X that contains b. This means that ‘b’ is by definition above X (is positive in Y). If ‘b’ were in the second quadrant (say 3pi/4), we’d get a negative dot product which means it’s the longer path…-b is closer to X (-pi/4). So the entire domain is on [0,pi/2]. Negating ‘b’ will give us the opposite direction for Y in the example and in that plane it’s angle (with respect to X) is pi/4. Flipping the plane about it’s X axis and looking at the opposite side.

Or another way from the example. Negate a, now our new ‘a’ is point straight to the left. We flip about the Y axis and it’s point to the right again and 'b is in the first quadrant.

In this example in 3D the angle is pi/2 or 90 degrees, so if the dot is negative we’re asking the math for a 270 degree rotation.

theagentd · August 18, 2015, 1:08pm

Benchmark results! Measured in operations per second.

JOML: 10 140 633 (max error: 0.000002 degrees)
GDX: 5 010 566 (max error: 12.9 degrees)
nlerp: 77 098 823 (max error: 4.1 degrees)
rlrp50: 45 055 037 (max error: 1.1 degrees, threshold: 0.50)
rlrp80: 31 719 745 (max error: 0.25 degrees, threshold: 0.80)
rlrp90: 28 103 211 (max error: 0.085 degrees, threshold: 0.90)
rlrp95: 22 463 962 (max error: 0.029 degrees, threshold: 0.95)
rlrp99: 16 707 044 (max error: 0.0026 degrees, threshold: 0.99)

princec · August 18, 2015, 1:17pm

Solution: provide all of them (except gdx), document accuracy vs speed, allow user to select according to requirements.

Cas

theagentd · August 18, 2015, 1:18pm

Low quality: nlerp
Medium/variable quality: recursive nlerp
High quality: slerp

princec · August 18, 2015, 1:24pm

btw I love Roquen’s detailing of stuff, and always stare intently at all of his cryptic messages… though I barely understand one symbol in 10. I stopped “doing” maths at around age 16 - it’s totally over my head. He is a treasure trove of computer science knowledge. We must pickle his brain before it is too late and extract what we can.

Cas

KaiHH · August 18, 2015, 1:48pm

Awesome intuitive explanation, Roquen!
But there is one thing that I still do not quite understand, namely the fact that the maximum rotation angle is only PI/2. My current intuition tells me that it should be PI.
Because:…
If a quaternion were to actually represent a rotation about let’s say 110 degrees. First of all, this is possible, right? I mean, I can define a quaternion (possibly from a rotation matrix) that does represent a rotation about the Z axis of 110 degrees.
Now, my second quaternion is the identity and thus does not represent any rotation. How would it be then that I can just have 110 - 90 = 20 degrees of rotation in slerp? I mean, I would still have to interpolate a rotation about the whole 110 degrees arc. Or don’t I?

Yes, that’s what would be reasonable. I have however the impression that people just don’t read JavaDocs.
So, I would name the method accordingly long and print on sysout and syserr for the first time a warning, whether the user really wanted to call this method.
But yes, people should first express the need for approximation functions, then we can add them.

Since you provided some number of benchmarks with different threshold values of the dot product to compromise speed vs. accuracy, would it be good for you to have an additional overload (or differently named) slerp that provides that threshold as parameter? I think that was initially proposed by @Roquen earlier.

EDIT: Okay, I just implemented the overload of slerp() taking the “threshold” parameter of the dot product between ‘this’ and ‘target’ below which the method performs non-spherical linear interpolation as per nlerp():

Quaterniond slerp(Quaterniond target, double alpha, double nlerpDotThreshold, Quaterniond dest)

The “old” existing method without that parameter uses the previous 1E-6 threshold, and makes it clear in the JavaDocs.
This now feels alot better, because it is made explicit that the method does this anyways and gives the user a lever to control accuracy vs. performance.

theagentd · August 18, 2015, 3:50pm

KaiHH:

theagentd:

Benchmark results! Measured in operations per second.

Since you provided some number of benchmarks with different threshold values of the dot product to compromise speed vs. accuracy, would it be good for you to have an additional overload (or differently named) slerp that provides that threshold as parameter? I think that was initially proposed by @Roquen earlier.

EDIT: Okay, I just implemented the overload of slerp() taking the “threshold” parameter of the dot product between ‘this’ and ‘target’ below which the method performs non-spherical linear interpolation as per nlerp():
Quaterniond slerp(Quaterniond target, double alpha, double nlerpDotThreshold, Quaterniond dest)
The “old” existing method without that parameter uses the previous 1E-6 threshold, and makes it clear in the JavaDocs.
This now feels alot better, because it is made explicit that the method does this anyways and gives the user a lever to control accuracy vs. performance.

You are mistaken. I did not test the JOML slerp method with different thresholds. I tested my own recursive nlerp function which nlerps the source quaternions until they are above a certain dot treshold, at which point a simple nlerp is done between them. Each iteration halves the angle between the quaternions, so it can do it pretty quickly, averaging 1.5 to 4 nlerps per call for sub-degree precision.

Modifying the threshold of JOML’s slerp is not a good idea, as the accuracy test of LibGDX’s version shows. Not normalizing when falling back to lerp has a massive precision impact.

You should keep the original slerp with a fixed threshold, as it is a good reference implementation with excellent accuracy. Reducing the threshold without normalizing the result has a big precision impact, and normalizing the result costs some extra cycles.
For nlerp(), it’s good to have an overloaded version that accepts a precomputed dot-product, as that value is often available when working with nlerp().
Add rnlerp(), recursive nlerp(), which takes in threshold value and refines the result until the final nlerp is done between two quaternions over the given threshold.

Source code:


    
    public Quaternion rlerp(Quaternion q, double alpha, double dotThreshold){
    	
    	//Both these are eliminated by escape analysis
    	Quaternion q1 = new Quaternion().set(this); //This is faster than using <this> as temporary variable
    	Quaternion q2 = new Quaternion().set(q);
    	
    	double dot = q1.dot(q2);
    	
    	while(dot < dotThreshold && dot > -dotThreshold){
    		if(alpha < 0.5){
    			q2.nlerp(q1, 0.5, dot);
    			alpha = alpha * 2;
    		}else{
    			q1.nlerp(q2, 0.5, dot);
    			alpha = alpha * 2 - 1;
    		}

        	dot = q1.dot(q2);
    	}
    	
    	return set(q1).nlerp(q2, alpha, dot);
    }

You could replace the temporary quaternions with local variables. Using this as a temporary variable is slower as they’re writes to RAM and can’t be moved to registers, and we can’t (and shouldn’t) modify the input quaternion. This one generates no garbage as escape analysis replaces them with stack variables.

EDIT:
This recursive nlerp function is faster than slerp() up to threshold 0.99992, at which point it has almost exactly the same precision as slerp():

slerp(): average = 2.057235254503539E-7, max = 2.4148365394514667E-6
rlerp(): average = 3.688761597702983E-7, max = 2.4148365394514667E-6

I vote for completely replacing slerp() with rlerp(), as it is just a more tweakable version of slerp().

EDIT 2:
In addition, those performance values are when using completely random quaternions. As you can see from the data from WSW, almost all interpolation is done between dot>0.99, in which case rlerp() is just nlerp(). With less random quaternions rlerp() is 2.5x faster than slerp() while providing the same max error.