FloatBuffers are slow...

I’m trying to make an optimized draw path basically by instead of doing OpenGL draw calls immediately, storing the call in an intelligent way in memory, then at the end of the repaint have the buffer draw itself. It’ll take all its data, put it into 3 FloatBuffers per texture, and then draw it all to the screen.

But it’s pretty slow. If I draw 10,000 untextured quads to the screen using glRectf I get at least 60 fps (I have it limited at the moment) whereas throwing everything into a few FloatBuffers and then using glDrawArrays gets only around 20. Very slow for something that is supposed to be optimized.

After profiling it I was able to determine that most of the time is taken up creating the FloatBuffers. I try adding it via array batches to speed it up, but that doesn’t necessarily help. It takes about 10-40ms to add 10,000 quads (that’s 40,000 vertex values, 40,000 texcoord values, and 80,000 color values). When every update is supposed to take 16ms (for 60fps) that’s pretty intensely high.

Does anyone have any pointers for how to optimize this process? I do it like this:

  • Call draw on a quad, creates an object that stores 3 arrays of Objects (Vertex2D, Vertex2D, Color4D) and adds it to the appropriate place in the buffer Collection.
  • Compile all the separate draw objects that were created into 3 FloatBuffers per texture.
    [list]
    [li]Construct 3 FloatBuffers using BufferUtils.createFloatBuffer(size).
  • Turn the 3 arrays of Vertex2D etc objects into float[] arrays.
  • Copy that array to the FloatBuffer with FloatBuffer.put().
  • Repeat the above 2 for every object that has the same texture.

[/li]

  • Call glDrawArrays on the buffers.
    [/list]

The possible optimizations I can see would perhaps be to do one single FloatBuffer.put() per texture by combining everything into one big array and then copying them all over at once. Similarly, if I could somehow know how many things will be drawn beforehand (which I can’t) I can reuse buffers. Or perhaps I can just make 3 big buffers that I keep reusing and use the last param of glDrawArrays to limit how much is used each time.

I’m just not sure the best way to do this. Advice? Thanks.

`FloatBuffer.put(value) is extremely slow
FloatBuffer.put(index, value) is a bit slow

float[] vec3 = new float[n];
existingFloatBuffer.clear().put(vec3).flip(); is quite fast, despite the additional copy
`

Also, do not create new (direct) FloatBuffers every frame, keep them around. Definitely use glDrawArrays().

You could use VBOs, but keep in mind, that if you change your geometry every frame, you won’t see much (or any…) framerate difference.

Thank you, sir. I’ll give that a try. :slight_smile:

Hrm. 35 FPS. Still much slower than simply drawing 10,000 rects. Lamesauce. Sucks to spend all this time making something “optimized” and it ending up slower. :confused: Still takes around 5ms to fill all the buffers. There’s still some other stuff I can do to optimize this, but it honestly might not even make sense to bother.

I guess I could give VBOs a go, even though I’ve never used them before. My geometry will not ever change drastically, only positions will change.

Why not post some code, so we can point and laugh give constructive criticism.

:stuck_out_tongue:

I will later after I’ve made more efficiency passes. So there’s less pointing and laughing.

I’m guessing I’m losing a lot of speed to GC calls now because every quad constructs (and then cleans up) 1 main object that contains 3 arrays, each containing 4 objects each.

So that’s a grand total of throwing out 13 * 10,000 = 130,000 objects every frame, not including the arrays. Certainly not optimal. I wanted to make the whole thing as OO and easy to use as possible but that probably won’t work out in the end. So yeah, before I put this up for scrutiny I’m going to get rid of all the things like that.

I’m managing about 8k sprites @ 60fps (not trying any harder) with VBOs… VBOs are definitely the way forward, performance-wise. The drivers are all heavily optimised for VBOs nowadays. At least, on decent cards.

Cas :slight_smile:

BTW use pools for things like sprites… they get allocated/deallocated so often it’s silly getting the VM to try and figure it out. It’s still OOP.

Cas :slight_smile:

I got it up to 55. I think the main thing with it is that I’m testing against absolute best case scenario (un-textured quads) so of course doing all of the work to get them ready to go in one single array call is a lot of copying in memory. However, I’d imagine when I have this in an actual game where textures are being used, swapped, etc. this will make a big difference.

I doubt I’ll worry about going to VBOs at this point but perhaps I will eventually.

I tried to put off VBOs for years and years but eventually I had no choice but to basically remove all my immediate mode rendering, or the performance of RotT plummets. YMMV, but don’t expect great things from any other technique any more. Though I understand that Minecraft manages ok with display lists (but then, that’s not doing sprites)

Cas :slight_smile:

Yeah I can’t really use display lists because I have very little static data in this game, pretty similar to Revenge of the Titans. I’m hoping I won’t need VBOs because this game is only going to need to run at 800x600, but we’ll see.

Well, what the hell, might as well learn VBOs while I’m here, they were pretty much made for this sort of application. I didn’t realize they were so similar to vertex arrays, which I am already very familiar with. Looks relatively simple, or as simple as learning new OpenGL stuff ever is…