Java OpenGL Math Library (JOML)

Riven · July 23, 2015, 9:16am

NIO allows for both absolute and relative I/O. Absolute puts on buffers do not increment the position.

I’m with Spasi on this one: I never liked NIO buffers being stateful. These buffers were meant to allow access to off-heap memory, but they slapped a ‘design philosophy’ on top that is supposedly convenient. We should (IMHO) think of buffers like primitive arrays, but they designed it like a Stack/List hybrid. As a result it took Sun roughly a decade to get NIO performance in the same ballpark as primitive arrays, sometimes. The generated ASM is still a mess, rather inefficient (compared to primitive array access), but most of this overhead is hidden by memory latency.

Anyhoo, I’m getting offtopic, so let me end with a loaded question: who here would be in favor of stateful arrays? :-*

KaiHH · July 23, 2015, 10:58am

Personally, I am with both @ra4king and @Spasi here.
I can see the use case of putting multiple subsequent matrices in a single UBO (for example) without manually incrementing the buffer position (for convenience), and also putting a single matrix in a buffer without then having to reset the position (also for convenience).
In fact, putting a series of matrices in a single UBO for uploading to OpenGL was the first use case I thought of when designing the get(ByteBuffer) method in Matrix4f. And I also had the same kind of talks with @Spasi before when tinkering about how to make the design play nicely with LWJGL and their users with not incrementing the buffer position, which would allow to reuse a single ByteBuffer for many uploads without ever touching the buffer position.
Another of course important aspect that drew the design in a certain direction was compatibility/alignment with existing APIs such as NIO, which, as you said, does increment the buffer position (in their relative put/get operations).
This aligning aspect was another reason I first wanted to make the get() operation behave like the relative get() operation of NIO.
Whether NIO’s design is flawed or not, I don’t know and don’t allow myself to judge.
So in the end we can say, that it is a matter of which use case is the more likely and which one do we want to support with the least amount of client code necessary and the best comfort and convenience.
We could of course, like NIO, have both relative and absolute operations. But this would make the API more complicated by providing two different methods with two different semantics.
It is a hard decision to make.

Riven · July 23, 2015, 1:04pm

I’m not so sure this ‘complicates’ the API. Choice is good^_[sup]TM[/sup]. It also wouldn’t grow the codebase much, as the relative I/O would piggyback on the absolute I/O functionality.

^{If only we had structs, and this whole point was moot…}

KaiHH · July 23, 2015, 1:46pm

Okay. So, if for the majority of people it is good to have a relative get(ByteBuffer) and an absolute get(int, ByteBuffer), then I’d be fine with that, since it would be more in line with what NIO itself does today.
So, people wanting to upload a single matrix to OpenGL, then would do:


Matrix4f m = ...;
FloatBuffer fb = ...;
m.get(0, fb);

and people wanting to load a bunch of matrices would do:


Matrix4f[] ms = ...;
FloatBuffer fb = ...;
for (Matrix4f m : ms) {
  m.get(fb);
}
fb.flip();

I would be fine with that. The price to pay however would be different semantics between JOML and LWJGL.

princec · July 23, 2015, 2:50pm

The original LWJGL vecmath library was just a temporary bodge that sort of got used a bit more than expected…

Cas

KaiHH · July 23, 2015, 3:04pm

Hm… could you explain how that relates to the current debate over having relative vs. absolute NIO methods?
I am assuming that, since LWJGL’s vecmath library used relative put/get operations, you are wishing that LWJGL hadn’t used relative operations but absolute ones instead. So you would be in favour of the absolute methods?

Spasi · July 23, 2015, 3:35pm

If you’re looking for more trouble, you could also add “terminal” versions of the various methods that accept a Matrix4f dest. Using theagentd’s earlier example:

matrix.translationRotateScale(...).mul(...).get(directBuffer); // get is the terminal operation
matrix.translationRotateScale(...).mul(..., directBuffer); // now mul is the terminal operation

The result of mul is stored in the buffer directly, which eliminates a copy and should result in better performance.

princec · July 23, 2015, 3:49pm

All I’m saying is that basing your API design decisions on what LWJGL programmers are used to is not really a sound foundation

Cas

KaiHH · July 23, 2015, 4:10pm

The hell? What makes you think I was looking for trouble either for me as implementor or for users using JOML? As I see it, the current debate is also not so much about performance as it is about convenience and meeting a user’s expectations.
Regarding the last point, it certainly is a good decision to base certain design decisions on those of widely and successfully used libraries to ease adoption for users.
And I also was not referring to LWJGL 2’s vecmath but to LWJGL’s usage of NIO buffers within all over its API.

Spasi · July 23, 2015, 4:21pm

[quote=“KaiHH,post:189,topic:53459”]
Hey, I was trying to be funny.

I know how hard it is being bombarded with requests and suggestions when doing open-source work. I contributed to that with one more suggestion, which is more stuff for you to consider (=trouble). It had nothing to do with the current debate.

KaiHH · July 23, 2015, 4:46pm

okay… I was like ‘what is he proposing there?’ ;D
No, everything’s cool. 8)

Riven · July 23, 2015, 4:58pm

As a design consideration, it might be a good idea to consider keeping all this I/O stuff out of the core classes, to keep them lean and mean. What are your thoughts on hijacking cylab’s idea and moving all NIO related functions to their own, dedicated class. (no, I’m not suggesting involving the Collections API :point:) At one point you might add a dedicated class for primitive array I/O without having to worry about bloating your core classes once again.

Just throwing it out there

KaiHH · July 23, 2015, 5:17pm

Oh nooo. Yet another one. I cannot stand it anymore. ;D

But yes. I would think about what it means for JOML to be lean and mean by estimating how “central” the aspect of putting a matrix in a NIO buffer actually is for JOML. This would raise the question of how likely JOML is going to be used without a Java/OpenGL binding, in such a way that having NIO buffer methods in the matrix classes would actually “pollute” those classes. That I honestly cannot tell. But I guestimate: not very likely.
Then I would also contrast that with how likely it is to have additional things to convert a JOML matrix into, which I also don’t see any use case currently with my very limited use-case-estimating-abilities.
So, I would currently like to leave the NIO buffer conversion methods in the matrix classes, because as I see it, that supports the anticipated use case of JOML being used with LWJGL/JOGL.

ra4king · July 23, 2015, 11:57pm

Oh I like Riven’s idea: a separate NIO Buffers class that deals with Buffer -> Vector/Matrix and Vector/Matrix -> Buffer. I support this idea, and we can then have both relative and absolute methods there for reading/writing.

After giving this much thought, I realized how annoying the code would be with a separate class. Calling a get(…) on the instance itself is much more elegant, at the cost of “bloat”… however…

… Kai is right. This library is meant for use with OpenGL in Java, with the two major libraries, LWJGL and JOGL, using NIO buffers thus they should be integrated into the core math classes.

ra4king · July 24, 2015, 12:19am

[quote=“Spasi,post:178,topic:53459”]

I understand your points and see where you’re coming from, but I’m coming from the standpoint of how NIO was supposed to be used (ignoring Riven’s distaste for it ;D). Either way, providing both relative and absolute methods would be best in my opinion.

[quote=“Spasi,post:178,topic:53459”]

Ahhh I missed that part. Hmm different semantics I agree, but how would the optimization be defeated? this.m00 is still read 3 times using the method call so that could be optimized just as well. Or did you refer to the optimization of less CPU registers being used as all the values wouldn’t need to be stored in registers until they are all evaluated?

Spasi · July 24, 2015, 9:05am

[quote=“ra4king,post:195,topic:53459”]
NIO made so many new things possible when it was released (including LWJGL). But it also is just a plain bad API. I mean no disrespect to the people that wrote it, they did what they had to do at the time. I mostly blame the platform/industry that makes it so hard to fix/replace standard APIs.

There are many high-profile developers that are looking for a NIO replacement these days. Look for the sun.misc.Unsafe drama that’s been unfolding recently (it’s being hidden away in Java 9, without a complete replacement) and you’ll find several references to why NIO needs a redesign.

[quote=“ra4king,post:195,topic:53459”]
This. More info:

The optimized code (without the method call) needs a couple of registers to do the whole thing. But it’s not massively faster than the alias-sensitive call. One would think that the method call pushes the arguments to 16 registers or the stack. But it’s much better than that. The JIT does a fantastic job with reordering operations, such that it works correctly (when argument = dest) and also the registers required are as few as possible. IIRC it does the job in 5 registers, last time I checked. This is also the reason that using Unsafe (e.g. via LibStruct) does not result in better performance*: the JVM never reorders Unsafe accesses and the resulting code is suboptimal.

pure CPU performance, ignoring the better cache utilization that LibStruct enables

KaiHH · July 24, 2015, 11:36am

joml-2d
Since I observed that most of the games JGO people do are mostly in 2D, I thought about why not make a variant of JOML that is specifically geared and optimized for 2D. It’s in the making and called joml-2d.
I know 2D is just a special case of 3D with some direction being projected to be expressible as the linear combination of the other two base axes.
However 2D math really only needs 2 classes: Matrix3f and Vector2f to represent all affine 2D transformations: translation, rotation, scaling and shearing.
There is a performance gain as well as being more memory efficient by now only having to upload 3x3 matrices as mat3 uniforms instead of doing it with 4x4 matrices and projecting (e.g. Z).
Some of you might always decompose the different affine transformations into some “Vector2f translation”, “float rotationAngle” and “Vector2f scaling” parts. With joml-2d I am planning to do this like JOML by having a single representation of a transformation, namely the matrix, and then methods on that matrix to set/get the different transformation properties, as well as to “apply” new transformations to existing ones.

ra4king · July 25, 2015, 9:25pm

[quote=“Spasi,post:196,topic:53459”]

I really don’t think NIO has that bad of an API design though… but yeah I’ve been following the Unsafe drama. I hope Oracle puts in a suitable replacement if they do remove it.

[quote=“Spasi,post:196,topic:53459”]

ra4king:

This. More info:

The optimized code (without the method call) needs a couple of registers to do the whole thing. But it’s not massively faster than the alias-sensitive call. One would think that the method call pushes the arguments to 16 registers or the stack. But it’s much better than that. The JIT does a fantastic job with reordering operations, such that it works correctly (when argument = dest) and also the registers required are as few as possible. IIRC it does the job in 5 registers, last time I checked. This is also the reason that using Unsafe (e.g. via LibStruct) does not result in better performance*: the JVM never reorders Unsafe accesses and the resulting code is suboptimal.

pure CPU performance, ignoring the better cache utilization that LibStruct enables

Ahhh this makes sense. 5 registers is insane though, I’m curious how it does it!

@KaiHH
Nice idea with the JOML-2D branch.

Now about the earlier discussion with absolute vs relative, should we go ahead with having get(buffer)/set(buffer) be relative while get(index, buffer)/set(index, buffer) be absolute? I’m working on those changes and will submit a pull request if that’s the final decision.

KaiHH · July 25, 2015, 10:38pm

Inlining, simple good old liveness analysis and instruction reordering. See here for liveness analysis: http://www.cs.colostate.edu/~mstrout/CS553/slides/lecture03.pdf.
Once the liveness of variables are known, the jit can allocate registers. And it only needs at max ‘n’ registers where ‘n’ is the maximum variables that are alive at any given moment.
Liveness analysis also outputs “gaps”, that are instructions where a variable is actually not alive (i.e. not needed).
So before actually doing register allocation, other instructions can be reordered before the first or after the last usage of these identified live variables to allow those registers to become free for those instructions.

Thanks!

I’m not quite sure if there is a final decision, or if there can be any.

ra4king · July 26, 2015, 7:38am

KaiHH:

ra4king:

Ahhh this makes sense. 5 registers is insane though, I’m curious how it does it!

Inlining, simple good old liveness analysis and instruction reordering. See here for liveness analysis: http://www.cs.colostate.edu/~mstrout/CS553/slides/lecture03.pdf.
Once the liveness of variables are known, the jit can allocate registers. And it only needs at max ‘n’ registers where ‘n’ is the maximum variables that are alive at any given moment.
Liveness analysis also outputs “gaps”, that are instructions where a variable is actually not alive (i.e. not needed).
So before actually doing register allocation, other instructions can be reordered before the first or after the last usage of these identified live variables to allow those registers to become free for those instructions.

Oh wow I need to remember to take some Data-Flow Analysis and Compiler classes! That was incredibly interesting, thanks for the link!

The last thing mentioned in the topic was yourself showing the example API and sounding decided. I’ll add the missing Byte/Float/Double-Buffer methods and wait for a decision