VBO performance

Hermasetas · March 10, 2014, 8:51am

Hi all!

I am currently working on a 2D game engine and it is coming together nicely!

But I have some questions regarding VBOs:

Some tutorials talk a lot about interleaving VBOs. I fully understand how this works but I am wondering how necessary it is.
It is mentioned as something that is really important but I don’t really see where the big performance boost is.
I think a render function with separate VBOs for position, texcoord, color etc is more readable than code using interleaving.
So would I be good continuing to use non-interleaved VBOs or is there some really good reason to used interleaving?

Since I’m using spritebatching I am sometimes going to have the following happen:
Draw until the batch is full -> flush the batch by drawing -> rinse an repeat
Can I run into problems with this because I am reusing the same VBO for all rendering?
Like if I put new data into the VBO before the previous rendering is done.
Should I then have a pool of VBOs that I take turns using?

trollwarrior1 · March 10, 2014, 9:07am

VBO is better than immediate rendering mode, because VBO allows you to store data. If you send data to vbo, render it, flush it, in theory is should be slower than immediate rendering.

To use VBO properly, upload once, draw many times.

Hermasetas · March 10, 2014, 9:27am

Maybe I was unclear about how I do my batching.

I have a buffer where I store all the vertex-data, and when I have stored up say 100 draw calls I send it all to the gpu in one go.
The thing is I don’t know what happens when I need to draw more than 100 things.
Can I just collect the next 100 draw calls and then send it to the gpu without worrying about the gpu still working on the last batch?

trollwarrior1 · March 10, 2014, 9:49am

What are you talking about? You seem to be just starting out in LWJGL, so listen to me…

Sending information every frame into vbo is same performance as immediate rendering… (glVertex2f) or even worse.

VBO mean vertex BUFFER object. Buffer means it stores data so you don’t have to reupload it every time, thus minimizing rendering time, since you skip uploading part.
Using VBO is like using immediate rendering mode, but you’re storing information in GPU’s memory instead of RAM.

Hermasetas · March 10, 2014, 11:06am

If you don’t understand my question then why are you answering? You are being very arrogant I think.

I am rendering via a spritebatcher. Non of my vertex-data is reusable.

Since I am using glDrawElements I need to use VBOs, right?

Opiop · March 10, 2014, 2:19pm

Then you shouldn’t be using a buffer object. That’s not what they are made for.

Maybe before you are so quick to judge someone you should realize you are the person who knows less here. Trollwarrior was correct, a buffer object is not optimal here. A vertex array is.

Hermasetas · March 10, 2014, 2:26pm

I am not saying that he is wrong I was just annoyed that he didn’t answer what I was asking.
I didn’t mean to be rude I just felt very talked down to and it kind of went against the positive vibe I normally get from this forum.

But then why is the GL_DYNAMIC_DRAW and GL_STREAM_DRAW flags there?

How do I use arrays with glDrawElements?

trollwarrior1 · March 10, 2014, 2:28pm

I understand your question and I gave you my answer.

Immediate mode:
Send Vertex Data -> OpenGL renders it

VBO:
Create a buffer -> Send data to buffer
Point opengl to buffer -> OpenGL renders it

Your method?:
Create a buffer
Send data to buffer -> Point opengl to buffer -> opengl renders it
Send data to buffer -> Point opengl to buffer -> opengl renders it
Send data to buffer -> Point opengl to buffer -> opengl renders it
Send data to buffer -> Point opengl to buffer -> opengl renders it
Send data to buffer -> Point opengl to buffer -> opengl renders it

What I mean is using VBO is completely useless if you don’t reuse the data…

If you have very small VBOs, than interleaved VBOs is probably the way, since sending more data at each go is more effective than sending small amount of data many times. Too much data is probably slow too, since it slows down access.
No you can’t run into problems. OpenGL will make sure everything is rendered in line for you. I mean it won’t do this:
render1, render 2, render 3. It will do render1, render2,render3.

EDIT------------------------
Just realized what you were asking completely. I believe that opengl does the rendering part once you do Display.update, I might be wrong though here…

Varkas · March 10, 2014, 2:29pm

Isn’t immediate mode performant enough for a 2D game engine?

Hermasetas · March 10, 2014, 2:34pm

I was thinking that arrays would be better for what I am doing but I have just read a tutorial on openGL 3.2 and the guy there was using VBOs and glDrawElements.

How would you guys go about doing spritebatching?

Just using glDrawArrays?

trollwarrior1 · March 10, 2014, 2:41pm

The first question should be:

Do you have performance issues with immediate rendering mode?

No: Then why are you working towards making a spritebatch? It will only make it harder to use and add additional bugs that will take time to resolve, for example not realizing to flush something before changing uniforms
Yes: Probably you’re doing something wrong elsewhere, because you cannot really have performance issues in 2d game, unless it has super duper shading, lighting, particles and all the other fancy stuff…

Bottom line: Never try to optimize code for performance, unless you know what you’re doing, but that’s not the case most of the time.

Hermasetas · March 10, 2014, 2:46pm

Uuhm I’m not sure that we understand eachother completely. Reading through my posts it might be my fault.

When I say “buffer” I mean a floatBuffer from java nio.

So my method is:

Create floatBuffer
Fill floatBuffer with data for a 100 quads (200 triangles)
Send floatBuffer to VBO
Render 200 triangles from VBO with glDrawElements
Repeat for the next x quads needed to be rendered

So I would guess that this is better than immediate mode.
But I would also think that skipping the step with sending the data to the VBO would be good too. I just don’t know how.

Varkas · March 10, 2014, 2:47pm

I’ve got 2 isometric game projects with LWJGL on my back now, and I must say both run fine with immediate mode, reaching 60FPS with plenty of spare time for activities on the CPU.

Unless you do the research for the research sake, in my experience you don’t need more than immediate mode for a 2D (or 2.5D) game.

Hermasetas · March 10, 2014, 2:48pm

I’m doing it mostly for fun

I guess I could make a game with decent performance with java2D but I like learning new stuff.

Opiop · March 10, 2014, 2:49pm

No. We have already said that filling a buffer object and sending it to the GPU every frame is far less efficient than immediate mode. Like I said, use vertex arrays or stop worrying about deprecated functions and use immediate mode.

theagentd · March 10, 2014, 2:50pm

trollwarrior1:

I understand your question and I gave you my answer.

Immediate mode:
Send Vertex Data -> OpenGL renders it

VBO:
Create a buffer -> Send data to buffer
Point opengl to buffer -> OpenGL renders it

Your method?:
Create a buffer
Send data to buffer -> Point opengl to buffer -> opengl renders it
Send data to buffer -> Point opengl to buffer -> opengl renders it
Send data to buffer -> Point opengl to buffer -> opengl renders it
Send data to buffer -> Point opengl to buffer -> opengl renders it
Send data to buffer -> Point opengl to buffer -> opengl renders it

What I mean is using VBO is completely useless if you don’t reuse the data…

If you have very small VBOs, than interleaved VBOs is probably the way, since sending more data at each go is more effective than sending small amount of data many times. Too much data is probably slow too, since it slows down access.

No you can’t run into problems. OpenGL will make sure everything is rendered in line for you. I mean it won’t do this:
render1, render 2, render 3. It will do render1, render2,render3.

Just want to clear up a few things here…

[quote]What I mean is using VBO is completely useless if you don’t reuse the data…
[/quote]
It’s much faster to use VBOs despite the data only being used once. Immediate mode is slow since OpenGL has a hard time making any assumptions about the type of data being submitted (you can mix glColor2ub() with glColor4f()) and generally has to accumulate data in an internal buffer to be sent of to the GPU, plus you get the huge overhead of calling so many OpenGL functions. By using VBO you can upload all your data with a single call to glBufferData() or even map the buffer so you can directly write to it which is even more efficient. Like Hermasetas wrote, this is exactly what GL_STREAM_DRAW is useful for. There’s a reason why immediate mode rendering is deprecated in OGL3.2+.

[quote]If you have very small VBOs, than interleaved VBOs is probably the way, since sending more data at each go is more effective than sending small amount of data many times. Too much data is probably slow too, since it slows down access.
[/quote]
There’s no basis for this. I have run code that uploads 30MBs of data into a single VBO without any performance drops. Data uploading is generally run in parallel so the GPU can continue working while the data is being uploaded, so memory uploads don’t usually have an impact on performance as long as the GPU has work to do while the transfer is happening.

[quote]2) No you can’t run into problems. OpenGL will make sure everything is rendered in line for you. I mean it won’t do this:
render1, render 2, render 3. It will do render1, render2,render3.
[/quote]
This is true. If you for example enable blending and render 1000 quads with different colors on top of each other, the ordering of the draw calls will ALWAYS be preserved*.

Actually, your GPU probably shades many (possibly all) quads in parallel on multiple stream processors, but the ROPs still guarantee the order of the actual color blending.

Troubleshoots · March 10, 2014, 3:20pm

The whole idea of sprite batching is to group vertex data together in batches. The basic design model that you should follow is like so:

    [b]set up matrices --> collect data  --> upload data to buffer object --> render[/b],    repeat

Let me give you an example. If you’ve used LibGDX before, you’ll have noticed that the sprite batcher requires you to call the begin and end methods, and between those method calls you have to add the sprites to the batch. The begin method simply sets up the matrices, enables blending if used and if using shaders it prepares the shaders and their variables. When you add in data to the batch, it stores the data (probably in an array, I’m not sure tbh) and then when you call the end method is flushes the batch (uploads the buffer objects data and renders it all), then disables blending.

Since indexed drawing is effectively useless, you’ll want to use interleaved vertex buffer objects.

Oh and may I recommend that you move on to OpenGL 2.0 and learn how to use shaders after you’re done with buffer objects. It’s a big step learning the maths, and despite what others have said about sticking to immediate mode, they’re still useful for 2D.

kappa · March 10, 2014, 3:26pm

Not sure why people are still advocating the use of immediate mode, its an ancient way to use OpenGL and should be avoided by all modern applications, has been depreciated and is not even part of OpenGL ES.

VBO’s are perfectly usable for dynamically changing data and are currently the preferred and fastest way to send such data to the GPU. If you can get away with using a more cutting edge version of OpenGL then you’ll probably want to look at using geometry shaders too.

As for whether to interleave or not, this depends on your data set and how much you send to the GPU every frame, if interleaving allows you to reduce this data (because of sharing vertex data) then go for interleaving otherwise non interleaving should be fine.
If you are using something like glBufferData() to transfer data to the VBO then you shouldn’t run into any problems of overwriting data which is in use as the driver will handle and sync that automatically. If however you want maximum transfer speed you’ll have to look into unsynchronized VBO’s using glMapBufferRange(), in which case you’ll have to manage the syncing manually using multiple VBO’s in a round robin way.

Opiop · March 10, 2014, 3:56pm

Kappa, because if it works then why change? Yes, its not “good” to use, but 2D games aren’t going to need enormous amounts of processing power. I’m completely ok with using deprecated code if I need to. That’s not to say you shouldn’t use modern techniques though.

theagentd · March 10, 2014, 3:59pm

Macs do not support OpenGL 3+ with compatibility mode, so if you want to use OGL3 functions on Macs you’ll need to get rid of all deprecated functions for your code.