OpenGL - Draw multiple objects issue

Hi everyone,

There is some time from my last post in the community, I’m still working on something that I’m calling engine :stuck_out_tongue:

Today I want to get suggestions about rendering a lot of stuff… To test the “power” of my code I put it to run with 5k colored/textured cubes. Running in a full-hd resolution with a good computer I can get max 35~40 FPS with the current solutions. So I want to know what are the best practices to draw a lot of elements. I’ll link the video.

This question comes to me when I try to add a lot of “grass” in a terrain, each grass has 3 faces, textures, normal, transform matrix, etc… But my performance immediately goes down.

Thank you all guys… this forum is the best place to get inspiration :slight_smile:

pehHt9Fqnlg

For rendering grass over some terrain let’s start getting to a good solution by writing down some basic assumptions/constraints:

  1. the whole meadow is composed of tens of thousands of grass “patches”
  2. each grass patch is made of the same geometry/model, just at different positions and with some rotation
  3. once each patch is put on the ground at a given position it stays there and does not move
  4. each path has probably the ability to “wave” with the wind (that is, its upper vertices can be displaced)

As usual, you have the following constraints to achieve good performance with OpenGL:
a) you want to minimize the number of draw calls (that is, the number of glDraw*() calls and, god forbid if you are doing it the number of glBegin/glEnd calls)
b) you want to minimize the number of actual vertices being drawn, including omitting vertices not visible to the user
c) you want to minimize the dynamic update of buffer data at each render loop cycle

Given constraint 0, 1 and 2 the optimal solution would probably be to fill a whole VBO with all pre-transformed world-space grass patches and then simply do a single (indexed) draw call. This would allow to optimize constraint a) and c) to the fullest, but would keep b) suboptimal, since you would also render grass patches not visible to the camera.

To optimize for b) we could start clustering grass patches. That is, we do not store all grass patches in a single buffer, but spatially divide them into multiple buffers. Then during rendering we can do simple frustum culling to decide whether one cluster is visible and draw them with a draw call.
Here you can also unpack a host of spatial acceleration data structures to do the culling most efficiently.

You might also consider OpenGL hardware instancing (google for “OpenGL instancing” and read the first 5-6 search results). You’d achieve a significantly smaller overall memory footprint when using it, because you’d only have to store the following information:

  • the geometry information of a single grass patch (with instance divisor 0)
  • the world-space position for each grass patch (with instance divisor 1)
  • the 2x2 rotation matrix for each grass patch (with instance divisor 1)

Why do we need a 2x2 rotation matrix now? Because each grass patch should be able to rotate randomly so as to reduce the appearance of uniformity in the grass patches on the terrain. (much like what is presented in this GPU gems article)

In the non-instanced variant described above, we do all positioning and rotation by pre-transforming all vertices into world-space and then submitting them to the buffer object.

When using instancing we cannot do that anymore, because we are back at model-space. So, we need a way to represent rotation. We could get away with a single rotation angle, but that would then require to evaluate sin/cos in the shader. This however would hurt performance more than storing 3 additional float values for the 2x2 matrix, in which that sin/cos calculation has already been done once on the CPU.

I’d start with implementing the “put all grass patches pre-tranformed into a single VBO and render them using a single draw call” method. I’ve implemented both approaches some eight years ago or so, and in both ways blending/overdraw was the main bottleneck in the end with rendering grass patches with alpha blending or with alpha discard.
If you are more into instancing, read this interesting article first, of which the “Extensions - ARB_instanced_arrays” is the method I suggested here.

This is an example of instanced rendering featuring 100,000 animated grass patches (each with three faces) at full hd resolution @70 FPS on a 2011 notebook graphics card:

So, rendering hundreds of thousands of little objects is doable! :slight_smile:

EDIT: Or with just using four different grass and flower textures based on simplex noise:

KaiHH,

really thanks for this awesome answer. I’ll read the articles and then try to codify it in this week.

I’ll try to update this thread if I had any doubts. =)

Thanks!!

It works!!! :slight_smile:

Thanks KaiHH

[quote=“KaiHH,post:2,topic:58251”]
If you take a 16bit integer (giving you up to 0.005 degrees of accuracy), you can then lookup the sin/cos values in a uniform-buffer or (as a fallback) a texture. This reduces 128 bits (4 floats) to 16 bit, at the cost of 2 lookups on the GPU. I expect that would yield a slight performance increase. You can decide to use only 10 bits, reducing the side of the uniform-buffer. For grass even 256 distinct rotation angles (8 bit integer) would be enough.

Err, sin() and cos() are pretty cheap in shaders. Also, vertex shaders are also often limited by the “export” cost of writing the output of the attributes to temporary memory, making all operations “free” up to a certain point. Even if cos/sin was that expensive, you could just precompute them on the CPU and upload them directly, then construct a rotation matrix from those two values.