Buffer discussion

robnugent · January 27, 2005, 7:12am

I am starting this thread to continue a discussion on the use of Buffers with JOGL that have arisen as a side-issue on this thread:

http://www.java-gaming.org/cgi-bin/JGNetForums/YaBB.cgi?board=jogl;action=display;num=1106324571

Ken Russell wrote:

[quote] Direct buffer usage, at least FloatBuffer and other types, should be very fast wth HotSpot. ByteBuffer is currently more of a problem (fixes for this problem are coming), but if you cast your direct ByteBuffers to MappedByteBuffer you should see very high speed with those buffer types as well. I would think that you could store your data in a large FloatBuffer and still get efficient access for collision detection and other purposes. Take a look at the source code for the Grand Canyon demo (an update of that demo to use JOGL rather than GL4Java is presently underway); it performs collision detection against the ground, the data for which is contained in a direct ShortBuffer (16-bit ground heights).
[/quote]
Ken - I should probably start with a lengthy aside:

[aside]
These were the problems that I saw when trying to move to Buffers and that my alternate approach which is to put my vertex,normal and texcoord data in an interleaved float array is every bit as fast as using Buffers, for the following reasons:

My V,N,TC data is generated programmatically into the float[]
The float[] is used to create a static draw VBO which involves a single lock-access-unlock of the float array
All subsequent rendering is done by binding and using the VBO, and hence is fast.

So I don’t think there is a benefit to moving to use Buffers. ( There is one possible exception to this in that I don’t properly understand the memory issues associated with what an OpenGL driver does with VBOs and whether there might be an extra copy of the data around for a VBO created from a Buffer vs. from a float[], and whether this even depends on whether it’s a GL_STATIC_DRAW_ARB VBO or not - comments on this are welcome )
[/aside]

Having said that I’m interested in understanding if my issues with Buffers could have been resolved, so let’s continue

Your comment about the performance of FloatBuffers is very interesting - my code is now much better structured to allow different data representations of the vertex data, so I might have a another go at encoding this representation at some point. Thanks for the suggestion.

Next, you suggest ‘casting’ a ByteBuffer to MappedByteBuffer. A couple of questions:

Did you mean ‘cast’ or did you just mean ‘use a MappedByteBuffer’ ? (I don’t see how casting is going to achieve anything other than maybe a ClassCastException :-))
Are you suggesting that the file underlying the MappedByteBuffer is the source of the vertex data ? I’m struggling a bit with why introducing a file underlying my vertex data would improve performance, as it would seem like an extra overhead ?

[quote] I agree that the issue of direct buffer alignment on certain OSs is nasty. I added (Sun-internal) comments to bug 4820023. If memory consumption for small direct buffers were more reasonable, would it be easier for you to use them in your application?
[/quote]
Yes, certainly. However, as I said in my [aside] this isn’t currently causing me an issue as I adopted a different approach. It would be great to have the option of using Buffers though, and this was the drop-dead issue for me.

Rob

princec · January 27, 2005, 8:13am

I’m still not sure about why using Buffers is a problem for you here.

What is wrong with allocating a nice big direct byte buffer and then mapping over the top of it with FloatBuffers at slightly different offsets?

One thing about OpenGL is that what data you use to render with and what data you use in your scenegraph are not necessarily the same thing. In particular you have to understand that for things like VBOs especially you can’t do read operations easily or sensibly as the data may exist serverside or in limbo in AGP RAM. So what I’m saying is, what you’re trying to do is a misguided attempt to save space, speed things up, and make things less complicated: unfortunately you cannot have all 3.

Therefore, all read-write-tweak-test should be done on float[] representations of your data, as you have it now. But when the time comes to draw you should always simply blat the floats out to buffers. The contents of the buffer can be assumed to be discarded every frame, most of the time - as the nature of many realtime scenes is that the vertex data actually changes every frame anyway. For some things like static geometry you might well be better off writing it once to the VBO but again you don’t need to worry about passing in little subsets of arrays - blat the lot out, once, and forget about it until the geometry is disposed (don’t rely on GC).

Cas

robnugent · January 27, 2005, 9:16am

Cas,

My VBO data is static, and I never read it back. I understand that if my vertex data were being manipulated dynamically each frame then a Buffer is a more appropriate mechanism - but I don’t need to do this in my app.

[quote] What is wrong with allocating a nice big direct byte buffer and then mapping over the top of it with FloatBuffers at slightly different offsets?
[/quote]
As I said before in the other thread, in my app, the ‘nice big buffer’ gets fragmented as the smaller sliced FloatBuffers are allocated and freed and the management of the fragments becomes a pain.

As I said before, I’m not trying to make a case against Buffers in general, and I understand how and when then can be used well, I was just highlighting that they are just not a panacea and there are some drawbacks to them.

Rob

Spasi · January 27, 2005, 10:02am

As I understand it, the only scenario in which specifying an array offset would be advantageous is dynamically generated data (and that’s questionable, in case HotSpot does cool optimizations). I can’t see how using buffers, for non-dynamic data (or even generated once), would be a performance problem in any app.

There’s also the problem with bounds checking. Any freely specified offset would need to be checked, whereas buffers guarantee a proper position (as in LWJGL).

OT: Rob, you mentioned GetPrimitiveArrayCritical in the previous thread, that avoids copying the array data. Sorry for my ignorance (I’ve never investigated how JOGL works), but isn’t that a little dangerous (cross-platform/VM-wise)?

princec · January 27, 2005, 10:16am

Rob,

I think you’re approaching the problem the wrong way around here. You don’t allocate or deallocate these buffers on a frame by frame basis; they’re used for throwing data to OpenGL. You allocate them once and they stay until you stop GL rendering or they turn out to be too small in which case you dispose of them and resize them.

Every single frame, you write all the data for your scene into the buffer, and then tell OpenGL to start rendering. With a bit of cunning you can interleave drawing and buffer filling; sometimes using a rendering double-buffer is what you need. Either way, there are no little scraps left lying around. If you need to allocate a bunch of these smaller buffers for static geometry, then you are seriously better off writing a primitive allocator/deallocator and split()ting the buffer, and don’t worry about wasting a bit of RAM. You just aren’t going to get all three of convenience, speed, and smallness. I’ve been doing this for some time for my GL apps and it’s working out just perfectly.

Spasi,

AFAIK, GetPrimitiveArrayCritical is reasonably safe and falls back to copying if it can’t do what it wants to. However it does stall the GC.

Cas

robnugent · January 27, 2005, 11:11am

Cas,

Maybe I’m not capable of expressing myself clearly enough here and if so I apologise.

[quote] Every single frame, you write all the data for your scene into the buffer
[/quote]
That’s a cost I just can’t afford.

Here’s what I’m doing:

An object needs to be added to my scene
A float[] array is created containing the interleaved V,N,TC data.
A static draw VBO is created from that interleaved data

The above only happens every time an object is added to the scene. When processing an individual frame I simply invoke the VBO which is extremely fast, as you know.

When an object is removed from the scene, the VBO is destroyed.

Let me give you some figures:

I typically have 1600 objects in my scene at any one time consisting of around 200 triangles each. Each object has a unique texture i.e. there are 1600 textures also. I’m getting 45 fps on my Athlon 1700+/ Geforce FX5200 or about 14.5 million triangles a second. That’s around 10MB of V,N,TC data which I just don’t need to copy around every frame.

The 1600 number is a limit that arises from the capabilities of my particular card - I’d like to have closer to 10000 concurrent objects in the scene, but that starts to push the available VRAM and the rendering capabilities of my card.

All my performance issues are related to how fast and how smoothly I can add and remove objects to and from my scene. My app may be considered unusual in that the full scene will never fit into RAM, necessitating this technique.

Rob

princec · January 27, 2005, 11:19am

In this case you definitely need to write your own allocator/deallocator perhaps using a primitive binary subdivision mechanism. You need to allocate one very large VBO for each kind of vertex format you’re using and track allocations of chunks of it as you add objects to the scene.

On the other hand it’s quite possible that the GL driver is doing this already when you ask it for VBOs. So what’s wrong with simply allocating a VBO on demand, writing the data into it, and then disposing of it a few minutes later when the object is removed?

I’m not sure exactly how this relates to arrays and offsets from the other thread though.

Cas

robnugent · January 27, 2005, 11:59am

Cas,

[quote] On the other hand it’s quite possible that the GL driver is doing this already when you ask it for VBOs. So what’s wrong with simply allocating a VBO on demand, writing the data into it, and then disposing of it a few minutes later when the object is removed?
[/quote]
Nothing- this is exactly what I’ve been doing all along and it works beautifully.

The discussion in this thread arose because it was suggested that I should be using Buffers (i.e. java.nio.Buffer ) as the source when creating my VBOs (and textures) rather than the float[] and byte[] that I am using.

I’ve argued all along that this doesn’t help, since if I move the model of my scene to using Buffers instead of float[], I have to deal with the page-alignment issues that Buffers suffer from.

Rob

princec · January 27, 2005, 12:08pm

Ahaha! Now I see what you’re on about! And yes, float arrays are what you should be using for storage, seeing as you’re just throwing the data into VBOs and leaving it there.

Cas

robnugent · January 27, 2005, 12:14pm

Spasi,

( Sorry - I tried to answer your post once already but my reply seems to have evaporated - probably I forgot to hit ‘save’ after ‘preview’ )

[quote]I can’t see how using buffers, for non-dynamic data (or even generated once), would be a performance problem in any app.
[/quote]
I agree. I’ve never suggested otherwise

My issue is that if I move to using Buffers instead of float[] for holding the master copy of my vertex data in main RAM, then I’m exposed to the page-alignment problems that Buffers suffer from.

I think that Ken Russell’s comments on the performance of FloatBuffers convinces me that I could consider moving to Buffers if it weren’t for this page-alignment issue. I would still however resent having to allocate (or recycle) sub-sliced Buffer objects simply to specify a sub-range.

[quote] There’s also the problem with bounds checking.
[/quote]
Agreed. However:

JOGL already suffers from this issue since it already exposes methods that take e.g. float[],length. It does not currently appear to ever check the length, by the way
I think the cost of checking the bounds would be very small if it were deemed that JOGL should do this (I don’t know if have an opinion on whether it should or not).
If you are using Buffers/subslices an equivalent cost (again not necessarily high) is hidden in the cost of allocating or recycling the right size subsliced Buffers

Rob

robnugent · January 27, 2005, 12:17pm

Cas,

[quote] Ahaha! Now I see what you’re on about!
[/quote]
Bingo !

Thanks for bearing with me on this - I find it really helpful to talk this stuff through - I now have a better picture in my head about what’s going on, by the way, which probably explains any lack of clarity in my earlier posts.

Rob

Spasi · January 27, 2005, 12:44pm

[quote]My issue is that if I move to using Buffers instead of float[] for holding the master copy of my vertex data in main RAM, then I’m exposed to the page-alignment problems that Buffers suffer from.

I think that Ken Russell’s comments on the performance of FloatBuffers convinces me that I could consider moving to Buffers if it weren’t for this page-alignment issue. I would still however resent having to allocate (or recycle) sub-sliced Buffer objects simply to specify a sub-range.
[/quote]
Yes, but my point was that with float arrays, sending data to a VBO every few frames should not be a performance problem, even if a copy to another cached Buffer or array is necessary (to avoid partitioning your master array). I didn’t recommend using FloatBuffers (especially if you use the master array for collisions, etc) and in fact I do use arrays in Marathon in some situations, that are copied as necessary when rendering.

[quote]3) If you are using Buffers/subslices an equivalent cost (again not necessarily high) is hidden in the cost of allocating or recycling the right size subsliced Buffers
[/quote]
Just a note, in LWJGL setting a different position() accomplishes this without the need for slices.

robnugent · January 27, 2005, 2:11pm

Spasi,

[quote] Yes, but my point was that with float arrays, sending data to a VBO every few frames should not be a performance problem
[/quote]
Well at the moment I have vertex data thus:

float[] (Master vertex data in RAM) ---------> VBO (On graphics card)

The data is only sent to the VBO once, when it and the float[] are created when a new object is added to the scene. It is this creation process that is performance critical. Subsequent rendering for each frame is fast. Any intermediate Buffer simply lengthens this path with an extra copy operation, and I can’t afford that.

[quote] Just a note, in LWJGL setting a different position() accomplishes this without the need for slices.
[/quote]
I’d presumed something like this was the case from one of your (or Cas’s) earlier posts.

(Aside: I’d also assumed that the the limit or capacity specified the end of the data, however peeking at the LWJGL javadoc now for glBufferDataARB , it seems to take a size parameter also , which potentially means that you can try to specify data beyond the limit or capacity and hence need to check bounds also ? Getting off topic here …)

In any case the LWJGL version is certainly more flexible than the JOGL version, which, if I’m reading the native code right simply uses the start of the supplied Buffer (JNI GetDirectBufferAddress call).

Rob

princec · January 27, 2005, 5:13pm

No, all methods that take a buffer have had their size parameters removed; it is implicit in the buffer type and buffer limit now. That way you can’t get it wrong

Cas

kbr · January 27, 2005, 9:20pm

I’m coming to this discussion a little late, but a few points:

[] It is perfectly legal to pass pointers into the Java heap acquired by GetPrimitiveArrayCritical to C – assuming that the semantics of the C routine don’t imply that that pointer is stored anywhere. glVertexPointer and similar routines do this, which is why in JOGL they are “NIO only” and only accept direct Buffers as arguments. glBufferData, for example, does not behave like this, which is why it’s legal to pass down Java primitive arrays to it.
[] FYI, if you happened to have dynamically changing data, you’d want to use glMapBuffer, which might return a pointer into AGP RAM or RAM that lived on the graphics card itself. The VertexArrayRange demo in the jogl-demos workspace and the analogous VertexBufferObject demo show the speedups possible when using direct buffers to access data outside the Java heap. See also Sven Goethel’s and my slides from our JavaOne 2002 talk linked to from the JOGL web page.
[] You’re right, JOGL does no bounds checking of size or other outgoing parameters, and doesn’t understand the semantics of many OpenGL routines to be able to range check indices into arrays or buffers. Plausibly we should add this. However, I would like to do it by teaching GlueGen how to generate the bounds checks rather than handwriting the Java and native code, which is more work up front but much easier to maintain afterward.[] We didn’t think of making the position of direct buffers relevant when implementing JOGL’s autogenerated code, as is done in LWJGL. However, it does seem that a lot of people like this style. I’ll raise the issue with the JSR 231/239 expert group about making those APIs pay attention to the position of Buffers.

robnugent · January 28, 2005, 5:28am

Cas,

Is the javadoc for Gl.glBufferDataARB on the website out of date then:

http://www.lwjgl.org/javadoc/ ?

I approve of the change by the way

Rob

robnugent · January 28, 2005, 5:35am

Ken,

[quote] It is perfectly legal to pass pointers into the Java heap acquired by GetPrimitiveArrayCritical to C…
[/quote]
Agreed - I wasn’t questioning JOGL use of this BTW - I was simply pointing out that JOGL didn’t internally copy the data from the byte array to an intermediate Buffer, as had been suggested.

Thanks for the other comments - glMapBuffer returning a ByteBuffer isn’t something I’d noticed/come across (having not yet played with ‘dynamic’ geometry).

Rob

elias · January 29, 2005, 6:37am

The reason glBufferDataARB has a size argument is for the case the Buffer argument is null. If it isn’t null, the limit() is used instead and the size is ignored. So it can’t go wrong.

elias

elias · January 29, 2005, 6:41am

And don’t forget the use of remaining()/limit() as a length/size argument too (as mentioned in the other thread). I think that’s just as important to avoid that redundancy when dealing with Buffer parameters (and you’ll avoid the check that the length is legal too, since that is done by Buffer already).

elias

Orangy_Tang · January 29, 2005, 8:29am

Doesn’t that really mean you’ve got two methods wrapped in one entry point there? Wouldn’t it make more sense to have:
glBufferDataARB(int size)
and
glBufferDataARB(ByteBuffer data) // with implicit size in limit(), throw an error if data == null

I know that’s deviating from being a straight OpenGL layer, but actually having a ‘size’ parameter that only works in certain circumstances seems confusing and wrong, IMHO.