Issues with batching meshes

I’ve been working on a new game, and I decided to batch the cubes that I’m rendering. I batched them, no problem, reduced the number of draw calls to 1, which quadrupled my FPS :stuck_out_tongue:

Problem is, now I can’t change the translation, rotation, scale and colour of the cubes without having to run the batch function all over again during the render loop, which causes lag. If I want to remove a cube from the batch, I can’t do that either because it’s all in one VAO and I have to run the batch function again to remove the cube.

If Minecraft batches the blocks to optimise the rendering process, how is it that the environment in Minecraft is destructible? And how is it that individual blocks can be outlined, even though all the blocks are part of a single mesh (the chunk)?

I’d love to know how ^this^ is done.

Have look at instancing: http://www.opengl-tutorial.org/intermediate-tutorials/billboards-particles/particles-instancing/
It’s used for particles and trees and grass and stuff, ThinMatrix also has a nice tutorial for instancing on youtube.

You basically upload your mesh once and then only buffers with all the position/rotation/etc data.

Yeah, I thought about instanced rendering. Is there another way, though? Instanced rendering can only be used if the meshes I’m rendering are all the same, which is inflexible, because I may want to render different types of meshes.

No one said that you may only use one single instanced draw call for your entire game world any everything in it.
If you have some number of mesh types, such as block, grass and stone, you just create three batches for each of those respective types, which you can render using three instanced draw calls.

Ahh, I don’t really feel like doing instanced rendering because I’m so proud of the batchng, but I guess I’ve got no choice :stuck_out_tongue:

Batching has two purposes:

  1. Improving performance.
    Batching is faster than doing one drawcall per cube. Draw calls are expensive, so packing your vertex data together and drawing it all in one draw call will be faster.

  2. As an investment.
    In the case of cube worlds, batching also works as an investment. Looping through the volume of an entire cube world is slow as hell. A 1000x1000x1000 world is already 1 billion blocks to check. This is a huge waste to each frame, because in the end only a tiny mesh will be generated for this world, as 99.99% of the world will be either solid ground or air. Hence, we precompute the mesh once and store the mesh, giving us the flexibility of cube worlds with the performance of a mesh.

The problem occurs when you want to change a single cube in the world. It’s too slow and expensive to try to modify the mesh based on the cube added/removed, so the only real choice is to regenerate the mesh from scratch. This will cause a massive spike even when just a single cube is changed. The solution is to split up the world into chunks, so that you only need to regenerate the chunks affected by the changed cube instead of the entire world. In theory, this introduces a trade-off, as we now have more than 1 draw call, but even 10 or 100 draw calls isn’t really that significant. The main point is to avoid having to regenerate the mesh each frame, which is still the case.

You get further problems if you want to update the terrain every frame. The answer to this is: don’t. If you want to animate the terrain textures, you can either update the texture to animate all cubes identically, or do the animation in the shader based on an ID (this way all cubes can be animated individually).

Instancing is NOT a good choice for a cube world. You want to only draw the faces that are between a solid block and an air/transparent block. In 99% of all cases, you won’t even be drawing an entire block. Hence, only being able to control visibility with the granularity of an entire cube with all faces visible is not good enough. For a simple case of a flat floor, this will draw 6x as many triangles as a face-based mesh. In addition, instancing is not as effective for small meshes. A cube is only 24 vertices. You should try to have have batches of at least 100 vertices or you’ll get reduced GPU performance drawing all the tiny meshes. Basically, on the CPU side instancing is much faster as it’s just one draw call, but on the GPU side you’re still drawing a crapload of small meshes, which GPUs aren’t good at drawing.

TL;DR: Use chunking, and if you need to animate the texture of a cube either update the texture or use a shader to animate it.

theagentd:

This was the answer that I wanted, thank you :smiley:

I was kinda thinking along the same lines, because whenever I see people using instanced rendering (on forums, tutorials, etc.) they always use it to render billboards (grass, particles, distant trees, etc.), but not actual cubes and big 3D meshes. That’s why I was reluctant to use instanced rendering for the cubes.

Also, my world isn’t the typical voxel game terrain. It’s just a whole bunch of (randomly positioned) blocks bound within a 200x200x200 space, so there aren’t that many cubes. I was just a little scared of regenerating the mesh whenever a cube is modified. Problem is, if one of the cubes is moving constantly and since I can’t regenerate the mesh every frame, I’ll have to exclude that block from the mesh :frowning:

Whatever, I can’t be too greedy about performance. At the very most, I’ll be sending in about 5-10 draw calls, which isn’t too bad.

Interesting stuff here, but I had a question I feel is relevant.

Do you know of a tutorial that shows how to batch draw with the programmable pipeline? Or can give an idea on how to pack the data and send it to the shader program? I’ve seen a few with the old fixed pipeline but am not sure how to accomplish batching with shaders.

I’m assuming, for instance that “attribute vec3 vertices” is only capable of accepting one buffer of vertices that is passed in via glVertexAttribPointer() but how can many different objects be passed in with only one draw call?

There are major advantages to placing all your geometry (at least the one that uses the same shader) into a single VBO, just packed sequentially into the buffer. This way, you can bind a single VAO, using index buffer offsets and BaseVertex-versions of the draw calls to read the right mesh inside the VBO. This still requires one draw call per mesh, but the cost of a draw call is proportional to how much state you change inbetween each call, as the OpenGL driver has to do extensive validation, state setup, look up premade resources for certain state combinations, etc.

So if I’m understanding this correctly it’s like any type of data streaming where information is grouped together representing different objects.

EDIT: I think I get it. Going to code and check it out, thanks for responding

Literally just dump all your meshes sequentially in a single VBO. Keep track of the vertex and index offset of each mesh in the buffer, and use those to draw the right one with the *BaseVertex() draw calls. The base vertex allows you add an offset to every single vertex read. This is nice, because it avoids you having to use 32-bit indices if you have more than 65536 vertices. So you can pack 100 meshes with 60000 vertices each and you can still use 16-bit indices.