Instance specific data

I want to draw 1000 to 2000 circles, each with its own 2D location and radius. I am going to use instancing to draw a mesh approximating a circle (16 vertices or something), but I obviously need instance specific data for position and radius. I’ve found some info on UBOs, Uniform Buffer Objects, that seem to be perfect for instance specific data (and skinning, among other things). I’d then use the gl_InstanceID in my shader to fetch the data, something similar to this:

gl_Position = position[gl_InstanceID] + circleVertex * radius[gl_InstanceID];

and then normalize the coordinates to screen coordinates.

This article provided good info on defining the uniform buffer stuff in the shader, but almost no info at all on creating and setting up the actual buffer with OpenGL commands in my program, nor any information on actually accessing the data in the vertex shader, so my code above is psuedocode until proven otherwise. :stuck_out_tongue: I can’t find any tutorials or even a working example, some I’m pretty much shooting blindly with my experimenting at the moment. I will obviously post a working example when (if?) I get it working.

I know you guys usually stick to your glBegins and vertex arrays, but please help me out here! (joke) xDDDDDD

The LWJGL test package contains a sample that uses UBO. See org.lwjgl.test.opengl.shaders.ShaderUNI.

I think you’ll find that instancing will be quite slow for the use case you’re interested in. It’s more useful when you have complex geometry (1000s of triangles) and a moderate amount of instances (a few 100s max). For circles/sprites, it’d be better to generate the geometry for all instances offline, then animate using transform feedback or OpenCL. Give it a go with instancing though, I might be wrong.

I abandoned UBOs, as they require you to specify an array size in your SHADERS, which looks terrible and sucks. I instead went with ARB_instanced_arrays (see the last method, the code is so clear that it’s scary). I ended up with a normal VBO for the circle positions drawing pizza slices to minimize the number of fragments that gets to the fragment shader. I then had a separate VBO with the instance specific data available using glVertexAttribDivisor().

My drawing is basically:

  1. Bind and map instance specific data buffer.
  2. Put instance specific data in buffer.
  3. Bind texture(s), VAO and shader
  4. glDrawArraysInstanced(GL_TRIANGLE_FAN, 0, SUBDIVISIONS + 2, num_circles);

Speed at 1000 circles with radius=10, each made of 12 slices ( = 14 vertices as you can see above): 920 FPS
With my fragment shader which does 2D raytracing with about 20 samples for each pixel, this drops to 720 FPS. This is still more than a 10x speedup compared to my old algorithm, but that algorithm was CPU bound (too many draw calls). This one is GPU-bound and barely uses any CPU at all (just loading 3000 floats in a buffer), which is insanely important for my game. All in all, it’s ridiculously fast compared to my earlier algorithm.

The goal was to reduce the CPU-load of my earlier algorithm, which is what instancing is all about in the first place. It’s not about getting higher vertex performance or fill-rate (but it still can improve it), it’s about reducing the number of draw calls from a varying number N=num_instances to a constant.