Java OpenGL Math Library (JOML)

KaiHH · July 10, 2015, 5:02pm

Could you please provide an example of your test fixture or a frustum setup where the sphere and the AABB culling provides a wrong result? Thanks alot for all that testing!
EDIT: Oh, yes you are totally right with the plane equations needing to be normalized when doing calculations with distances! I just reproduced it with a single plane equation and it totally did not work.
Yes, in that case it is not really useful when the plane normals would always be recomputed for each invocation. A cached normal in a Plane class would be way better then! Thanks for spotting!
EDIT2: I fix the methods that they all work in the same “measure.” They may not be the fastest methods then, but maybe still useful as some “convenience” methods.

theagentd · July 10, 2015, 5:42pm

If possible, keep the planes as field variables. Manually unrolling the loop did have an impact as you can see. Simply creating a FrustumCuller class that you can create from/update with a matrix (or more than 1 matrices) would be useful. That’s what I have. Another really important thing is distance-based culling as well. Here’s my FrustumCuller class for reference: http://www.java-gaming.org/?action=pastebin&id=1307. Scavenge what you want from it, but like I said, making the planes field variables instead of Plane objects does have a noticeable impact.

Note that the ability to pass in a max distance is extremely useful in many cases, for example when culling objects with a maximum render distance. Another vital feature for me is the ability to push planes a certain distance, which is useful when rendering shadow maps. If you perfectly fit a shadow map to the frustum, you may still want to render things outside the frustum using GL_DEPTH_CLAMP to make sure they still cast shadows without wasting precision in the shadow map.

KaiHH · July 10, 2015, 5:48pm

Okay, thanks!
I fixed the Matrix4.isSphereInsideFrustum() at least. From what I see, it now works for unnormalized plane equations, too, albeit of course a lot slower due to Math.sqrt().
Could you check it again for correctness?

Also, I cannot seem to find an issue with the AABB test. It’s literally implemented by the algorithm described in http://www.cescg.org/CESCG-2002/DSykoraJJelinek/ (section 2.4).
And it also does not make use of distance measures.
Note that this method now returns -1 (as declared by the JavaDocs) if the box is inside the frustum and a value greater equal 0 for the plane index that culled the box if it does not intersect the frustum.

KaiHH · July 10, 2015, 5:56pm

Haha! That could be the reason why your percentage measure of JOML’s AABB is actually the oppositve of your algorithm both 98.517% and 1.483% give 100% together.
Could you invert the measure of JOML’s test?

pitbuller · July 10, 2015, 7:26pm

Planes don’t actually need to be normalized if you never ask actual distance from the plane.
http://iquilezles.org/www/articles/frustum/frustum.htm

KaiHH · July 10, 2015, 7:42pm

Cool, thanks for that link, pitbuller.
Another thing: I just built a simple Culler class storing the plane normals as four Vector4f instances and then doing the frustum culling methods on them. Turned out okay.
Then just for the heck I eliminated those four Vector4f instances and stored the 16 float values as instance fields. That alone consistently brought a speedup of roughly 17%…
that 4 additional GC marks and clazz pointers trashed the L1 cache…? unbelievable

Riven · July 10, 2015, 7:48pm

This is most likely not related to cache trashing. Get yourself a debug JDK build and dump the (x86) ASM. That 17% diff most likely resembles completely different native code, not just a few changes.

KaiHH · July 10, 2015, 7:52pm

Yeah, I am already using hsdis-amd64.dll to show the generated code and once cheered at first sight as I saw that HotSpot was emitting SSE code, but then this was just scalar code… so not really SIMD.
I sooo wish for a (possibly internal) API exposing SSE intrinsics as Java methods. As vector types maybe float[], which are internally converted to SSE vector types.

theagentd · July 10, 2015, 7:56pm

Sphere culling has been fixed in your latest release and produce identical culling as my own code. AABB culling was indeed an inverted culling error on my end. All three methods now cull the exact same amount as my code. Point culling is ~1.5% faster than my code, but sphere culling is almost half as fast as my code (~58% as fast as my code). Your AABB culling code is ~20% faster than mine though.

KaiHH · July 10, 2015, 7:59pm

Thanks for checking again!
Yes, that sphere culling code is now really really bad , as with non-cached plane normals the plane equations have to be renormalized on every invocation.
You might give the new Culler class a try, which has an identical interface but caches the plane normals now on creation and updates it in its set(Matrix4f) method.

theagentd · July 10, 2015, 8:09pm

An empty Culler() constructor would be nice, as they’re usually created ahead of time and reused for each frame.

Using Culler:

Point culling: ~1.5% faster
Sphere culling: ~1% faster
AABB culling: ~33% faster

Impressive AABB results!

theagentd · July 10, 2015, 8:18pm

A proper frustum culler really needs to have distance based culling though. The tests so far has been with distance culling disabled in my own culler. With it enabled my code blows it out of the water:

[tr][td]Test[/td][td]My culler[/td][td]% visible[/td][td]JOML[/td][td]% visible[/td][/tr]
[tr][td]Point culling[/td][td]169 329k[/td][td]0.7891%[/td][td]67 132k[/td][td]1.2311%[/td][/tr]
[tr][td]Sphere culling[/td][td]178 325k[/td][td]1.0543%[/td][td]68 786k[/td][td]1.6708%[/td][/tr]
[tr][td]AABB culling[/td][td]43 084k[/td][td]0.9672%[/td][td]53 637k[/td][td]1.4816%[/td][/tr]

With distance based culling, you get rid of a lot more stuff, in addition to improving performance a lot, at least for sphere and point rendering. AABB culling actually suffers a bit, but the improved accuracy, culling 1/3rd of the remaining AABBs is well worth it IMO.

EDIT: Part of this gain lies in the fact that it culls some things that are inside the frustum but at least my engine are completely covered with fog (it’s based on distance from camera, not the z-buffer), and it also only works for perspective matrices.

KaiHH · July 10, 2015, 8:23pm

Could you please clarify more about which distance (from where to where) you mean?
But, I wanted to have a simple working solution in JOML, I’ll leave the most performant solution to you.
If you like, you can also contribute. I’d be glad to accept any Pull Requests.

theagentd · July 10, 2015, 8:33pm

If you look in the culler code I posted, I do a simple squared distance-from-the-camera test for points and similar tests for spheres and AABBs before testing it against any planes. For most game levels the world is significantly larger than the view frustum, so this culls a large number of objects.

For spheres for example:


		float dx = x - this.x, dy = y - this.y, dz = z - this.z;
		float distSqrd = dx*dx + dy*dy + dz*dz;
		
		float r = viewDistance + radius;
		if(distSqrd > r*r){
			return false; //Cannot be visible
		}

You may also want to order the frustums in a way that culls as much as possible as early as possible. A distance-from-the-camera test pretty much culls the same things as the far plane (a little bit more as it curves inwards around the edges, culling things that are technically inside the frustum but covered in fog), but it doesn’t cull much behind the camera. The optimal order therefore becomes something like this:

Distance test. Only objects in a sphere around the camera passes.
Near plane. Cuts out the half of the sphere that’s behind the camera.
Left and right planes. Those planes generally cull more than the top and bottom planes as most games have horizontal maps.
Top and bottom planes. These usually don’t cull much.
OPTIONAL: The far plane. The initial sphere test culls everything beyond the far-plane anyway.

Also, an extremely useful function is being able to pass in a variable viewDistance when culling. In my culler, I actually cull against min(farPlane, maxDistance), where farPlane is the far plane distance of the camera and maxDistance is a parameter that you can pass into overloaded versions of the culling methods. For example, I have objects that are only rendered up to a certain distance. Checking if they’re inside the frustum and THEN checking if they’re within a certain range is rather stupid as it checks the distance twice. Having it as a feature in the frustum culler is definitely a good idea IMO.

KaiHH · July 10, 2015, 8:41pm

Nice ideas!
Yeah, it would be possible to compute the circumscribed sphere of the viewing frustum (we know the frustum’s eight corners easily) and then do a simple radius/distance check with points and spheres. I will add that.
But for AABBs what would you test the viewDistance with? Would it make sense to also compute the circumscribed sphere of the AABB, too, and then compare sphere vs. sphere?

theagentd · July 10, 2015, 8:50pm

For the sphere to cull as much as possible, you want the sphere to be centered at the camera’s position, e.g. the origin of the perspective matrix. That makes the sphere cull along the edge of the fog, getting rid of everything that would’ve been invisible anyway. For orthographic projections, then the case is a bit different.

For AABBs, you can either compute a bounding sphere or calculate the exact distance to it. http://stackoverflow.com/questions/5254838/calculating-distance-between-a-point-and-a-rectangular-box-nearest-point

KaiHH · July 11, 2015, 1:39pm

theagentd:

Further suggestions:

…

I see you made an multiply-and-add function (fma())! Awesome! It only takes in two vectors though, so one that takes in a float for the multiplier would be nice, fma(Vector3, float).

The arguments of fma() could be given better names. They’re currently (v1, v2), which says nothing about what they do. May I suggest “add” and “multiplier” for example?

Scalar version of Vector*.add() and sub() would be nice too. I have at least one place in my code that does that.

Many functions in quaternion also implicitly normalizes the quaternion. The weirdest one is invert(), but many others do too. I believe the user should be in charge of normalizing quaternions and providing normalized inputs to functions that need it.

Questions:

A number of quaternion functions seem to internally use doubles. Is there a reasoning behind that?

Sorry, theagentd, I realized I did not respond to your questions and suggestions.
About the Vector*.fma() with scalar: It’s in now.
About the arguments of Vector*.fma(): The are both multiplicands. The result (as stated by the JavaDocs) is: this.fma(a, b) = this + a * b
About scalar versions of Vector*.add/sub: They were already provided when you proposed it.
About quaternion normalization: I added another Quaternion.unitInvert() - named after Ogre’s identical function - which assumes that the quaternion is already normalized and then just computes its conjugate. Apart from this function, I do not see any other function that implicitly normalizes the quaternion, apart from the various functions to build a quaternion from axis-angle or other different rotation representations, which require the quaternion to be normalized in order to represent only a rotation and not also a scaling.
About some of the quaternion functions using doubles: That’s just for additional precision, which is indeed improved when using double for intermediate calculations.

Roquen · July 11, 2015, 4:14pm

I’m on vacation for the next few weeks. I took a quick skimmed and could give a very large number of feedback points. Some quick ones. Generally nobody really cares about 4x4 matrices…what they really want is 4x3. There’s tons of examples of reducible operations, sometime at the cost of adding a rounding step…who cares. The quaternion function unitInvert should indicate it’s proper name: conjugate.

The slerp implementation is a classic example of a function one always sees that should never, ever be called.

KaiHH · July 11, 2015, 4:54pm

Thanks for taking the time to assess and evaluate JOML!
Much appreciated!
I am always happy to hear about any constructive criticism and suggestions you have to improve JOML on the parts you see don’t fit your requirements. Again, anyone is free to use the Issues section on GitHub for change or feature requests.
As for the 4x4 matrices, please understand that JOML wants to support the full range of possible transformations, especially orthographic and perspective projections, which need that last matrix row.
There is however the Matrix4.mul4x3() method which assumes that its right operand has (0, 0, 0, 1) as its last row.
Oh, and happy vacations!

gouessej · July 11, 2015, 10:53pm

theagentd, where is your source code? You claim that your own frustum culling is faster but how can I check that?

Is there a benchmark that I can run on my machine?