NIO Buffer Pointer Semantics & Performance

uran · November 3, 2005, 6:11am

First I’d like to say that overall JSR-231 looks good.

After spending about an hour+ converting nicely sized project, I must add that generally speaking I see no big issues with it, infact quite the opposite, some longstanding problems have been addressed and its certainly nice to see the API making its way into the core.

However, on the other side of this coin I’ve began to wonder (backed by some preliminary testing) whether or not the new NIO buffer semantics would adversely affect performance for bandwith intensive (particularly dynamic textures) applications. Fact of the matter is, with advent of “pointer significance” when passing NIO buffers to OpenGL, extra JNI overhead is inevitable. Easing VBO usage (which is stated as one of the goals of this change) is certainly welcome, however I wonder if in the process of making VBO usage less verbose one would introduce more overhead in general through all the newly-required flips/rewinds, per frame, which in a tight loop may indeed be quite noticeable.

I can’t speak for everyone, and ofcourse I’m not privy to all the rationale for instituting this new policy, and whether or not improvements in security model have played a role in this decision, but it would be nice to hear from the rest of you what kind of performance improvement/degradation occured as a result of this change. Albeit its quite early to speak of performance per-se, IMO its healthy to get a bearing since it is after all in review.

I understand that LWJGL uses this method, and aparently most people who use it are happy with it. Please let me know your thoughts on this matter and … clearly I must be missing some advantages besides easing VBO semantics (among other related methods) whereby slicing is perceived to be an issue.

Thanks for all the hard work, and thank to all who contribute.

princec · November 3, 2005, 10:14am

Specifically normally one creates a large contiguous buffer in magic 3D RAM for plonking one’s vertices in and it remains hanging around for the duration of the scene. However the contents of the buffer change fairly radically as you wander around the scene. In order of horribleness, you could:

Allocate VBOs for each node of geometry in the scene, fill them up, render them, and dispose of them. Aagh!
Allocate one VBO for the duration of the scene, then slice() it up into bits each frame and render the geometry with them, then dispose the slices. Aaagh!
Allocate one VBO for the duration of the scene, fill it up with geometry and remember the starts and lengths somewhere, render just those bits by setting the position() and limit().

I hope it’s clear that 3. is a couple of orders of magnitude faster.

Cas

uran · November 3, 2005, 7:45pm

Thanks for clarification.

As far as VBO is concerned, and general (VBO) usage guidelines never raised much question.

I was mainly probing to see what kind of improment/degradation occured in those programs which have one or more tight loops per-frame where dynamic textures are downloaded. i.e: if in a certain context I have to download 3 textures per-frame, lets assume for example 3 planes of a YUV frame, then thats 3 additional JNI calls one would have to make to rewind the buffer twice, once before and once after filling it up. One could make all sorts of arguments about packing textures, but truth of the matter is, everyone will always have some argument to be made.

Anyways, I must be the only one who sees issue with additional JNI overhead, so without further ado, its all clear.

kbr · November 3, 2005, 8:35pm

There is no JNI involved in setting or retrieving the position or limit of a Buffer. This is all Java code and when these values are passed from JOGL’s Java code to C via JNI the values are all fetched using pure Java code. The only additional overhead is in passing down an additional int argument through JNI (negligible) and adding this offset on to the base pointer (again, negligible). We did some performance testing with higih-throughput demos like the VertexArrayRange demo and weren’t able to see any speed hit from paying attention to buffer offsets.

uran · November 3, 2005, 9:05pm

Ok well in that case, attribute this to lack of sleep (and/or other misdeed on my part) that I did notice a minor difference in CPU usage between the two test runs of a program based on JOGL 1.1.1 and one based on JSR-231 (beta1). I will continue all conversion to JSR-231 as all other improvements outweigh this (aparently misconceived on my part detriment) and if I do indeed solidify substantial performance degradation, I will post for further discussion.

P.S:
Test machine: P3 @ 500Mhz, Radeon 9k
JOGL 1.1.1 test run consumed on average about ~2%.
JSR-231 beta1 consumed on average about ~9%.

In my efforts to sqeeze out most of the CPU for other tasks, I may have overstepped common sense and forgotten that premature optimizations are root of all evil.

kbr · November 4, 2005, 1:50am

A lot has changed in the JOGL implementation since 1.1.1 and it’s certainly possible that additional code in the JOGL implementation could be the cause of the increased CPU consumption. If you get some profile data from your application pointing to new code in JOGL as the culprit then please provide a test case and file a bug.