VBOs

thedanisaur · July 20, 2014, 8:05pm

Hey all, new question. It’s my understanding that I should combine multiple meshes into a VBO so that I can draw more efficiently. How would I go about doing this, considering the following?

My meshes are instanced across the world
Not all the meshes show up on screen at a single time

Where I’m really stuck is with instancing, if I want to draw the mesh in the VBO how do I move it’s position efficiently? Should I move the mesh to the proper location and then add it to the VBO? Wouldn’t that defeat the purpose of instancing? What about having to call glDraw* for every mesh anyway, since even if it’s in the VBO I only want to draw one of the meshes, or draw it multiple times in multiple locations?

Hermasetas · July 20, 2014, 8:40pm

I’m no expert but I would say that you only put one mesh in each vbo, and then have multiple vbos.
So if you have a model of a bike you put that in one vbo and the model of a car in another vbo

The point with vbos is that you create them once and then reuse them a lot.

I hope this answered your question

thedanisaur · July 20, 2014, 9:10pm

That’s what I thought too, but I can’t seem to get any performance gains ???. In fact since moving away from fixed function immediate mode I’ve gotten worse performance.

tkausl · July 20, 2014, 9:22pm

Whats “fixed function”? There are Intermediate-Draw, Display-Lists, VertexBufferArrays and VertexBufferObjects
VertexBufferObjects are the “newest” and Recommended type to draw but the FASTEST is the (old old) DisplayList.
Slowest is Intermediate, you should never use this.

thedanisaur · July 20, 2014, 9:36pm

Isn’t it immediate mode? Anyway that’s what I’ve been using with glBegin() glEnd(). Around 3 years ago I used display lists a lot as well, but they only gave me ~+5fps, which is good, but I just stopped using them.

The problem is I still get around +15FPS if I use immediate mode instead of VBOs. Granted I’m just drawing cubes, but I’m drawing 5625 of them, so only 45,000 verts. I feel like I should be getting better FPS.

Immediate mode - 26 FPS
VBOs - 12 FPS

Granted I am using shaders w/ the VBOs, but they don’t do anything special, just a spot for the diffuse texture.

The FPS still seems pitifully low.

basil · July 20, 2014, 9:43pm

simplified … instance drawing is about having just

one draw call (like glDrawArraysInstanced or glDrawElementsInstanced), you tell these guys how many instances you like to draw and …
in the vertex shader you get the build-in variable gl_InstanceID, which is the the current instance you draw.

the tricky part is now what to do with the instance-id inside the vertex shader. some ideas are, simple offsetting, feeding into a noise function, or indexing into another buffer (like a UBO) which describes the exact location/rotation/scale of the instance (matrix). lots of possibilities here.

check out http://www.opengl.org/wiki/Vertex_Rendering#Instancing.

you can go deeper into that topic with things like, instanced vertex attributes (http://www.opengl.org/registry/specs/ARB/instanced_arrays.txt) or offset rendering (http://www.opengl.org/registry/specs/ARB/base_instance.txt), but thats pretty overkill for a start

thedanisaur · July 20, 2014, 9:51pm

OK I think something is still going over my head… :emo:

So I put one mesh in a VBO and then call glDrawElementsInstanced(). Then in my vertex shader I move/rotate/scale to my liking? I do this by keeping an array of transform matrices and IDs around?

HeroesGraveDev · July 20, 2014, 10:03pm

Since this thread seems to be filled with a lot of misinformation, I’ll do my best to give a recommendation, although I can’t guarantee that what I say is 100% correct either.

The purpose of all buffer objects to reduce the number of opengl calls.

VBOs do this by A) buffering all the data into one draw call; and B) redrawing the same buffer multiple times.

The optimal use of VBOs is one that takes advantage of both those aspects. The less VBOs you have, the less draw calls. Therefore wherever possible you should try to fit as much data as possible into one VBO.
However, some data may change more frequently than others. If you have some data that never changes, some data that changes every X frames, and some that changes every frame, you should be putting each group into a different VBO. (Another thing to note is that for data that irregularly changes, you should try and group it in a way that it changes at the same time to avoid unneccessary rebuffering)

Furthermore, you can hint to the GPU how often you the data should be changing. There’s no guarantee that this will increase performance, but there’s no loss and a potentially decent gain.
For data that changes every frame, use GL_STREAM_DRAW.
For data that never changes, use GL_STATIC_DRAW.
For data that is somewhere in the middle, use GL_DYNAMIC_DRAW.

Finally, there is a ‘magic number’ that gives you the optimal maximum size of a VBO. ‘GL12.GL_MAX_ELEMENTS_VERTICES’. The closer you can get to this number without going over, the more efficient your VBOs will be.

basil · July 20, 2014, 10:05pm

[quote=“thedanisaur,post:7,topic:50183”]
yes, that’s one way to do it.

for a start you can just add the instance-id to the vertex position. like

gl_Position = MVP * (gl_Vertex + vec4(0.0, 0.0, float(gl_InstanceID) * 10.0, 0.0));

that should move every instance by 10 units along z-axis. (havn’t tested it actually ;))

thedanisaur · July 20, 2014, 10:47pm

@HeroesGraveDev: [quote]However, some data may change more frequently than others. If you have some data that never changes, some data that changes every X frames, and some that changes every frame, you should be putting each group into a different VBO. (Another thing to note is that for data that irregularly changes, you should try and group it in a way that it changes at the same time to avoid unneccessary rebuffering)
[/quote]
This is precisely where I’m confused. If I’m moving around the world the stuff that gets drawn changes, but none of the geometry does. Does this mean that I have to rebuild my buffer every time geometry leaves/enters the field of view? (assuming all the data on screen can fit into one buffer)

@basil_: Cool!

tkausl · July 20, 2014, 11:35pm

No, your Mesh is the same Mesh as before, only your MVP (ModelViewProjection)-Matrix changes. Thats what basil_ did there in the VertexShader, multiply every Vertex with the MVP-Matrix so you get the new Position whitout changing your VBO ever.

thedanisaur · July 20, 2014, 11:51pm

OK, yeah that’s kinda what I got out of this.

@HeroesGraveDev: could you still clarify?

Also, I did a quick check and I can put 33000 verts into a VBO on my machine, so if I want to run optimally what would I do then? The obvious/naive choice is to group multiple meshes into a VBO, but since the world changes I’d have to rebuild the VBO every X frames(when some of the meshes are no longer on screen). Is that a good idea, won’t it slow down pretty heavily rebuilding a VBO of that size?

HeroesGraveDev · July 21, 2014, 12:00am

@OP: You have a chunk system right? If so, have 1 VBO per chunk.

Also, 33000 doesn;t sound right for GL_MAX_ELEMENTS_VERTICES. I can’t check right now, but are you sure you called glGetInteger(GL_MAX_ELEMENTS_VERTICES) because I think you may have just read the enum value.

thedanisaur · July 21, 2014, 12:43am

No, no chunk system yet. Also, yeah I forgot to call it with glGetIntegerv() so here’s what it actually returns: 1048576.

Edit: Actually, wait. I think you’re assuming that I’m making a block world? Or do many engines use chunks?

I’m only rendering cubes because I don’t have a way to load my meshes, which is what I’m working on today ;D.

Edit 2: I do have a system where I have terrain “patches” that the terrain and terrain objects are stored in. The world is made up of multiple patches.

SHC · July 21, 2014, 1:28pm

I’ll just keep the static terrain in a VBO and use face culling to prevent drawing of things that are not on the screen. I’ll use a Batcher to batch draw movable objects (meshes).

gouessej · July 24, 2014, 11:08am

This is rarely true even on hardware made in 2004 compared to static VBOs and tons of drivers have broken display list implementations. You forgot to mention compiled vertex arrays Display lists are sometimes faster on tiny sets of data (less than 10 primitives). In my humble opinion, writing readable and efficient code is easier with VBOs and vertex arrays.

PandaMoniumHUN · July 24, 2014, 12:32pm

There is no way you’re getting worse performance using VBOs than with using immediate mode regardless of how many vertices are you putting into a VBO.

I mean, just think about it. Lets say you only put 100 vertices to 5 VBOs. That means you have to do 5 binding calls (if you’re using VAOs, if not it should be around 10-15) and 5 drawing commands so about 10 commands in total versus 2+500 (the 2 is the [icode]glBegin(…)[/icode] and [icode]glEnd()[/icode] calls) every frame.

I think you’re using VBOs wrong, maybe recreating them every frame instead of initializing them once and using them later.
The way you’re supposed to use VBOs is:
Initialization (you only have to do this once):

Create and bind VBO
Fill up with data
Unbind (this is not a must, but it’s good practice to prevent errors)

Rendering (you do this every frame):

Bind the already existing VBO
Assign vertex pointers
Draw command

You can make this process even easier by using VAOs to store buffers and vertex attribute pointers, this way you don’t have to set them on every frame, just bind the VAO and call the draw command.

Edit: Oh and about the VBO sizes: Nvidia recommends that the VBO’s size should be somewhere between 1MB and 4MB. However, my approach usually depends on the kind of game I’m making. If it’s a 3D game with complex models featuring thousands of vertices I usually assign 1 VBO per mesh, but if it’s a 2D game I usually use a single VBO for the entire scene and refill data in that VBO every frame with the visible object’s vertices.

thedanisaur · July 26, 2014, 5:26am

@pandamoniumhum: I would really like to say you are right, but I think you helped me set up my VBOs in another thread so you would’ve seen my code. Anyway I only initialize them once. I have everything set up exactly the same except for vbo vs immediate rendering. (Both using shaders)

I know it should be faster, but it isnt. I’ll run some more tests and post back, but I don’t expect a difference.

gouessej · July 26, 2014, 8:01am

@thedanisaur Which graphics card do you use? Which driver? 26 FPS to draw 45000 vertices is even worse than what I obtain with my former “old” graphics card (ATI Radeon 9250 Pro, OpenGL 1.3). In my humble opinion, there is something else that slows down your application.

Do you use static VBOs (GL_STATIC_DRAW) for things that don’t change as HeroesGraveDev suggested?

thedanisaur · July 31, 2014, 1:08am

@gouessej: The “graphics card” I use is actually on an intel i5 chip I think it’s the Intel HD Graphics 4025. So I don’t expect things to be blazing fast, but 45,000 tris when it’s rated at about 300,000,000 is sad.

Edit: Anyway it does look like my FPS has stabilized a bit, I can now render 546,496 tris at ~40ish FPS and 1.2 million with a solid 30FPS.

I guess part of my problem was having 5625 VBOs with only 8 vertices in each. That combined with the integrated CPU AND having the objects floating around in RAM still (not as VBOs) caused the slowdown.