Double VBO no performance increase or decrease?

I have been attempting to get a double VBO system going. To my understanding it is nothing really special just create 2 VBOs and then fill them up right? So that is what I have done, but I have no performance gain or performance decrease according to my DT times.

It does not matter how many buffers I use (keeping one VBO or having 2 at the ready) my DT times are roughly the same.
Both with single and double VBOs it sits around .0024 seconds.

Surely, I would think that my DT time would be better with a double VBO assuming I did it right or worse if it was done wrong. But them being the same? It has to be wrong, right?

Here is the code I got, let me know if more info I needed. I am rendering 100000 32x32 quads



	public void render()
	{
                //Clear the current buffer
		vertexBuffer[buffer].clear();

                //Set up a new random for the positions
		int x = 0; int y = 0;
		Random random = new Random();
		
		//X and Ys for the Quad; The Z is set to 1.0f; Remember 100000 quads
		for(int i = 0; i < MAX_QUADS; i++)
		{
			x = random.nextInt(800);
			y = random.nextInt(600);
			draw(x, y, 32.0f, 32.0f);
		}
		
		//Flip the current buffer
		vertexBuffer[buffer].flip();
		
		//Use the shader program
		glUseProgram(shaderProgramHandle);
		
                //Bind the current buffer and send it off to queue up in the GPU
		glBindBuffer(GL_ARRAY_BUFFER, vertexBufferHandle);
		glBufferData(GL_ARRAY_BUFFER, vertexBuffer[buffer], GL_STREAM_DRAW);
		
		//Get the Uniform for the MVP shader
		glUniformMatrix4(matrixBufferHandle, false, mvpBuffer);
		
		//Enable the VAO and its attributes
		glBindVertexArray(vertexArrayHandle);
		GL11.glDrawElements(GL_TRIANGLES, indexDrawCount, GL11.GL_UNSIGNED_INT, 0);
		glBindVertexArray(0);
		
		buffer = 1 - buffer;
	}

        //Put in here just in case
	private void createBuffers()
	{	
                //Create the Vertex Buffers
		vertexBuffer = new FloatBuffer[2];
		vertexBuffer[0] = BufferUtils.createFloatBuffer(MAX_VERTEX_COUNT);
		vertexBuffer[1] = BufferUtils.createFloatBuffer(MAX_VERTEX_COUNT);
	
                //Index Buffer used
		indexBuffer = BufferUtils.createIntBuffer(MAX_INDEX_COUNT);
	}


Other things to mention I’m using one VAO, one Vertex Buffer Handle, and a Index buffer handle like so


        //Init the VAO for the buffers, the Vertex Buffers will be Identical 
	private void initializeVAO()
	{
		//Bind the VAO
		glBindVertexArray(vertexArrayHandle);
		
		//Bind the Vertex Buffer and indicate where the Vertex are
		glEnableVertexAttribArray(0);
		glBindBuffer(GL_ARRAY_BUFFER, vertexBufferHandle);
		glVertexAttribPointer(0, 3, GL_FLOAT, false, 0, 0);
		
		//Bind the Index Buffer
		glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, indexBufferHandle);
		glBindVertexArray(0);
	}


Double buffering, unless I’m mistaken, is for framebuffers, and it makes sure that the whole window is drawn before displaying it on screen.

LWJGL uses it by default.

Generally the bottleneck with OpenGL is the amount of glXXX commands you’re using, so if you can cut more of them out, then you save time. (There are a few commands that are more expensive, but that’s the general idea)

You might see an improvement with multiple buffers if doing many small batches.

Or if one set of data remains constant while another changes a lot.

I think we are talking about 2 different things, I’m talking about the rendering technique where you use one VBO to write data to and then the GPU updates/uses the other VBO.

Really? You think that would matter, I would figure that attempting to render 100000 quads every frame where the data changes would surely cause a performance hit. I’ll have to try it

Also I was wondering this, maybe I’m confusing the terms Vertex Buffer and Vertex Buffer Handle. When I use the glGenBuffers() command, this in my mind creates the Vertex Buffer Handle. Where the Vertex Buffer (as I think of it) is the FloatBuffer created. So maybe I’m really supposed to be swapping the Vertex Buffer Handles? Where then I would just have one FloatBuffer that us used for writing

Then when I bind the FloatBuffer and the Vertex Buffer Handle, this takes the data and ‘dumps’ it on the GPU. Where then the Vertex Buffer Handle is responsible for its association. And then I can just use the same FloatBuffer for my data

Or is that just completely off the wall?

That technique works fine and has the advantage of being easy to implement and follow.

Cas :slight_smile:

Why bother? It’s a lot of extra work for very little gain. Pretty much a premature optimization, unless you’ve actually benchmarked and found that buffer syncs is causing a problem.

Anyways, you should use mapped buffers for this.
http://www.java-gaming.org/index.php?topic=28209.0

Double buffering when using glBufferData() doesn’t really do anything. I believe what you’re trying to avoid is overwriting vertex data before the GPU is done reading the previous data. However, on all drivers I’ve tested glBufferData() does this for you. It’s up to the driver how this is done, but most drivers seem to either copy the data you submit with glBufferData() to a temporary internal buffer in RAM and then upload it once the GPU is done with the previous data, or doing pretty much the same thing but in VRAM. The GPU could be multiple frames behind the CPU, so there might even be multiple buffers involved, but the driver will always honor the OpenGL specification (e.g. not overwrite data while it’s still needed).

Using multiple VBOs is only useful if you’re updating the same buffer, in contrast to reallocating (AKA orphaning) your VBO. If you were to allocate your VBO once with glBufferData() and then update the same buffer with glBufferSubData() each frame your double buffering may improve performance, but here too glBufferSubData() may internally buffer the data in RAM similar to how glBufferData() works. The only really useful case for keeping multiple buffers is when mapping buffers glMapBuffer(). When you map the buffer, OpenGL will have to block until the previous data is no longer in use, alternating between several buffers will prevent the CPU from stalling due while waiting for the GPU to catch up. You can even completely disable OpenGL’s built-in synchronization of buffers with glMapBufferRange() to achieve very high performance, but this requires you to either synchronize manually or use enough buffers to guarantee that you never overwrite a buffer that the GPU isn’t done with yet. Take a look at Riven’s excellent implementation here on how to implement that: http://www.java-gaming.org/topics/opengl-lightning-fast-managed-vbo-mapping/28209/view.html

There are other aspects to consider as well. 200,000 calls to Math.random() may be the bottleneck in which case you won’t see performance changes from the GPU. Similarly, you might be fill-rate limited so you won’t see performance changes until you reduce screen resolution, overdraw or simplify your fragment shader.

The double buffered VBO technique worked for me about 5 years ago on an integrated chipset :slight_smile: But theagentd is basically right, the driver is already doing this for you.

Cas :slight_smile:

Seems to be useless on the desktop these days, but can work miricals on mobile GPUs

I see, I did not know this, either way I was going to work my way up to the glMapBufferRange. I guess I will just have to jump up to there. I’ll have to take a look at that article again and try to implement it.

I don’t know if this matters but since you mention this you could be right. I don’t think it would effect it that much but I’m not sure.
I doing my tests on a 800 x 600 window. My Fragment Shader is as simple as possible I think, it just returns the color red. Could you explain overdraw I’m unfamiliar with this term in the openGL context.

Also was I confusing my terms for VBOs? Basically should I consider this my VBO



//VBO or just some storage space for Vertex Data
FloatBuffer vertexBuffer = BufferUtils.createFloatBuffer(MAX_VERTEX_COUNT);


or is my VBO really this



//My VBO or just a pointer to the VBO and its data
IntBuffer buffers = BufferUtils.createIntBuffer(1);
glGenBuffers(buffers);
		
int vertexBufferHandle = buffers.get(0);


With many small batches and a single buffer, you submit a buffer to be rendered, then very soon you submit the same buffer. This can stall, since the first batch needs to be flushed before the second batch. If you have a number of buffers and cycle thru them, you can avoid this problem. This exact thing happened with a game of mine on an HTC G1, the first Android phone. Maybe the improvement will be less pronounced with more modern phones. Likely it doesn’t matter at all on the desktop.