Questions about VBOs / IBOs

noodleBowl · January 4, 2014, 6:42pm

I was wondering about this and I’m hoping you guys can help me out.
I hope I can explain this, so please bare with me.

I come from the world of DirectX 11 where you can use / setup Dynamic Vertex and Index Buffers.
Which have ‘lock flags’, these log flags basically tell the GPU how data is going to be treated. Well when you set up Dynamic Vertex Buffers and Index Buffers you use a DISCARD and NOOVERWRITE flag. The DISCARD flags says trash everything in the buffers, we don’t care about it. Where as the NOOVERWRITE flag says anything that is already in use, you can’t have.

Using these both you can create a circular buffer system. since the DISCARD flag does this special thing that creates a new place in mem for the GPU to use. Something like the below is a way to get a circular buffer:



//Code done when needing to draw, places the draw data directly into the vertex buffer
if(vertexBufferPosition + vertexdPerQuad > maxVertexAllowed)
{
        //Says that we are done manipulating draw data
	batchContext->Unmap(vertexBuffer, 0);

        //Go ahead aand send everything of to the GPU and draw all that needs to be ddrawn
	batchContext->IASetPrimitiveTopology(D3D11_PRIMITIVE_TOPOLOGY_TRIANGLELIST);
	batchContext->DrawIndexed(indexCount, 0, 0);
	indexCount = 0;

        //Reset our stuff and since we were out of space, give us another spot in mem to use (Setting the mapFlag to DISCARD)
	vertexBufferPosition = 0;
	mappingFlag = D3D11_MAP_WRITE_DISCARD;

        //Relock the buffer so we can add more GPU data, also since we are using a fresh buffer, set the lockflag to NOOVERWRITE
	batchContext->Map(vertexBuffer, 0, mappingFlag, 0, &mapVertexResouce);
	mappingFlag = D3D11_MAP_WRITE_NO_OVERWRITE;

}

//Draw data placed directly into the vertex buffer
addDrawData(x,y,width,height);

SO my question, can this be done in openGL? Is there such things as lockflags and mapping?
I understand how to place data into a vertex buffer based on some tutorials, but that was for static buffers

How should I be placing data into a buffer when it needs to change every frame or so? For things such as a SpriteBatcher or Partical emitter? I should also mention I’m trying to target openGL 3.3 and up

trollwarrior1 · January 4, 2014, 7:51pm

DirectX is made for game development if I’m correct.
OpenGL has nothing to do with games. It only renders primitives that you specify.

You can use immediate mode, vertex arrays, display lists, vertex buffer objects, frame buffer objects and some more.

If you want to make something like “flags” in OpenGL, you would need to code it yourself. After you send data to the GPU, you can edit it by glMapBuffer(int target); This would give you a ByteBuffer. You can now edit it to your liking.

My answer is not what you wanted, because I don’t really understand what is going on on DirectX.

Troubleshoots · January 4, 2014, 8:04pm

OpenGL is very different to DirectX. I’d recommend reading this book.

Buffer data is added to a buffer object using the command [icode]glBufferData(int target, Buffer buffer, int usage)[/icode]. The usage is what you’re wanting to change. If you want to update the buffer data every frame, you’d want to change it to GL_STREAM_DRAW. Then you can simply overwrite the data using either [icode]glBufferData[/icode] or [icode]glBufferSubData[/icode]. This page on this wiki is quite good.

noodleBowl · January 4, 2014, 8:54pm

Before things go in the wrong direction, it kinda seems like I’m implying that I have no knowledge of how openGL works

I know about VBOs, VAOs, IBOs, and etc. I know DirectX and openGL are very different.

My question, unfortunately, is all over the place. I really want to know from a performance standpoint for 2D work. What is the best way or what way should I be sending data to the GPU?

In the sense like Troubleshoots pointed out, I would want to use the usage flag GL_STREAM_DRAW, because I plan for data to change at least every frame. BUT also I want to know everyone thoughts on the type of buffer system to use

When I mentioned DirectX and gave an example that was a circular buffer. Essentially we were always able to stream data to the GPU and it would prevent blocking calls. Since we are able to get a new location in memory to use. Well I’m not sure if this is possible to do in openGL, not possible in that there are ‘special’ glMapBuffer() params or something that says if we are out of space in the buffer just give us a new one. If it is awesome and how/should can it be done?

If not then lets talk about a double buffer system, where we load data into one, send it off to the GPU, fill the secondary buffer with data, and rinse and repeat.

I guess what I’m truly asking at the end of the day is if I cannot make a circular buffer (With ‘special’ params or etc), how should I be doing a double buffer system? Not in the sense of what needs to be done at high level, but what are things I need to worry about?

How should my openGL calls look? When am I going to be blocked by the GPU? Things of that nature

PandaMoniumHUN · January 5, 2014, 9:35pm

Since DirectX and OpenGL is so much more different and because you have so many questions I highly suggest reading an up-to-date book about OpenGL.
Sadly, they’re pretty hard to get but from what I hear OpenGL Superbible’s 6th edition is quite good. Also OpenGL 4.0 Shading Language Cookbook is a great piece, however, they’re quite expensive. :’(
Could you explain what do you mean by “blocking calls”?
I don’t know how exactly [icode]glMapBuffer()[/icode] works so I can’t help you with that one.
OpenGL contexts are automatically double buffered and you’re always drawing to the back buffer (from default). After you’re done with your frame you can swap the buffers (LWJGL automatically does this for you with [icode]Display.update()[/icode]) so the user will see the rendered frame in the front buffer, and you can work on the back buffer that is currently filled with the frame before (that’s why you have to clear the color/depth/stencil buffers so you clear those from the current back buffer).

Riven · January 5, 2014, 11:10pm

glMapBuffer has the biggest overhead of all.

glMapBufferRange with UNSYNCHRONIZED flag has the least overhead of all VBO operations that move data around.

as it’s unsynchronized (the driver doesn’t lock anything at all) you need a queue of at least 4-5 VBOs and take/release one every frame, or you’ll get corrupted vRAM.

noodleBowl · January 6, 2014, 2:21am

I have the openGL superbible 5th edition, but like you mention its up to date
Also for me books are hit and miss. They are great for learning how to do something, but I feel like they don’t get down on the nitty gritty stuff like I’m looking for imo

By blocking calls I mean the mutex lock (on hardware level) that is done by the GPU when it is using a buffers data. I understand that the GPU can stall and shutout the CPU until its done finishing the commands it is running wiht the buffer.

We have misunderstanding here (totally my fault), when I mean double buffer I’m talking about using 2 Vertex Buffer Objects. In the sense that is described in this article from Apple https://developer.apple.com/library/mac/documentation/graphicsimaging/conceptual/opengl-macprogguide/opengl_designstrategies/opengl_designstrategies.html#//apple_ref/doc/uid/TP40001987-CH2-SW8

The idea behind it is pretty simple in all honesty
You have 2 Vertex Buffers and you alternate between them each frame. So beginning with frame 1 you have 2 blank buffers. You fill one and send it to the GPU for drawing. Then on the next frame you fill the next Vertex Buffer, while the GPU is still working on the first one. By doing this you are supposed to be able to prevent the GPU from locking down on the Vertex Buffers. I have read that this works great on both ATI and Nvidia cards

At both Riven and Panda, since you guys kinda go together on this one

I have read that you can do a technique called ‘orphaning’ using either the glMapBuffer call or the glMapBufferRange call

Where the idea is that when using the glMapBuffer call, you use glBindBuffer with a null pointer to the Vertex Buffer. And what this does is, its supposed to, tell the driver hey give me a spot in mem that I can use because I need it now. And then y follow up with glMapBuffer() and place all your data in the vertex buffer. Send it off to the GPU and etc. This allows you to have multiple Vertex Buffers in flight where they then can be processes by the GPU. I heard you can do this with the glMapBufferRange call to using some a couple of flags, but I have heard (using glMapBufferRange) can be a real performance killer if you do it wrong.

That is why I’m kinda leaning to the Double VBO approach, am not not sure how I should be setting it up I guess. My ideas on implimentation are shady. As anyone has some experience with this?

I was thinking some thing like



public void endRender()
{
	/* Other required items above, core VBO stuff below*/

	//Enable the Vertex Buffer, bind it for the Shader
	glEnableVertexAttribArray(0);
	glBindBuffer(GL_ARRAY_BUFFER, currentVertexBuffer);
	glVertexAttribPointer(0, 3, GL_FLOAT, false, 0, 0);

	//Enable the UV Buffer, bind it for the Shader
	glEnableVertexAttribArray(1);
	glBindBuffer(GL_ARRAY_BUFFER, currentUvBuffer);
	glVertexAttribPointer(1, 2, GL_FLOAT, false, 0, 0);

	//Bind the STATIC Index Buffer (Since it is always the same slot wise)
	//Draw the number of elements needed
	glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, indexBuffer);
	GL11.glDrawElements(GL_TRIANGLES, numberOfIndex, GL11.GL_UNSIGNED_INT, 0 );

	/Disable all the attributes for safty 
	glDisableVertexAttribArray(0);
	glDisableVertexAttribArray(1);
	glDisableVertexAttribArray(2);

	//Swap the buffers
	bufferToUse = !buffertoUse;
	currentVertexBuffer = vertexBuffers.get(bufferToUse);
	currentUvBuffer = uvBuffers.get(bufferToUse);
}

public void addDrawData(/*Draw data params */)
{

	if(/* The vertex buffer limit is going to be over  or a texture swap is needed*/)
	{
		//Send stuff to the GPU
		endRender();
		
		//Reset / clear the buffer. Prepare it for writing
		currentVertexBuffer.clear();
		currentUvBuffer.clear();
	}
	
	// Add the data for drawing into the buffer
	currentVertexBuffer.put(/*Some Vertex data for vert 1*/);
	currentVertexBuffer.put(/*Some Vertex data for vert 2*/);
	currentVertexBuffer.put(/*Some Vertex data for vert 3*/);
	currentVertexBuffer.put(/*Some Vertex data for vert 4*/);
	
	currentUvBuffer.put(/*Some UV data for vert 1*/);
	currentUvBuffer.put(/*Some UV data for vert 2*/);
	currentUvBuffer.put(/*Some UV data for vert 3*/);
	currentUvBuffer.put(/*Some UV data for vert 4*/);
}

Riven · January 6, 2014, 6:10am

Everybody suggesting orphaning hasn’t tried unsynchronized glMapBufferRange…

That, and who cares that glMapBufferRange is slow if you use it wrongly? A supercar chokes on diesel too. I’ve never heard that as an argument against it

Riven · January 6, 2014, 7:02am

noodleBowl · January 7, 2014, 2:47am

You might be correct

Okay, so while I’m still a little uneasy about glMapBuferRange. I figure I give it a try, now I’m not sure if I did this right but…
This is what I got, feel free to tell be I’m going about this all wrong.


//My main draw method
public void Render()
{
	batcher.beginRender(); //Sanity Checks if we are calling the right methods (Don't call endRender before begin and etc

//Queue the VBO with draw data!
	for(int i = 0; i < 6000; i++)
        	batcher.draw(randX(), randY(), 32.0f, 32.0f);

//End the render; draw whatever we got in the VBO
	batcher.endRender();
}


/* All below apart of a Batcher Class; Has a Vertex Buffer and UV Buffer set to GL_STATIC_DRAW. Has a index buffer that is static and filled
from the start based on the max size of the vertex buffer. Lets assume its 10000 verts */
//My Draw method, get everything to the screen
public void endRender()
{
	//Bind the texture
	glActiveTexture(GL_TEXTURE0);
	glBindTexture(GL_TEXTURE_2D, currentTexture);
	glUniform1i(texture, 0);

	//Enable the vertex buffer
	glEnableVertexAttribArray(0);
	glBindBuffer(GL_ARRAY_BUFFER, vertexBufferHandle);
	glVertexAttribPointer(0, 3, GL_FLOAT, false, 0, 0);

	//Enable the uv buffer
	glEnableVertexAttribArray(1);
	glBindBuffer(GL_ARRAY_BUFFER, uvBufferHandle);
	glVertexAttribPointer(1, 2, GL_FLOAT, false, 0, 0);

	//Draw everything needed based n index buffer
	glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, indexBufferHandle);
	GL11.glDrawElements(GL_TRIANGLES, indexCount, GL11.GL_UNSIGNED_INT, 0 );

	//Safety disable
	glDisableVertexAttribArray(0);
	glDisableVertexAttribArray(1);
}


//My put some data into the VBOs method; places data into the Vertex and UV vbo; Checks if orphans need making
public void addDrawData(float x, float y, float width, float height)
{
	//Check if we will be over the limit
	if(vertexBufferPosition + 4 > vertexBufferMax)
	{
		//End the render; draw stuff
		endRender();

		//Orphan the buffer
		glBufferData(GL_ARRAY_BUFFER, 0, GL_STREAM_DRAW);
		vertexBufferPosition = 0;
		indexCount = 0;
	}

	//Map the vertex buffer
	vertexBuffer = glMapBufferRange(GL_ARRAY_BUFFER, vertexBufferPosition, 4, GL_MAP_WRITE_BIT | GL_MAP_UNSYNCHRONIZED_BIT, null).asFloatBuffer();

	vertexBuffer.put(x);
	vertexBuffer.put(y);
	vertexBuffer.put(x + width);
	vertexBuffer.put(y + height);

	glUnmapBuffer(GL_ARRAY_BUFFER);

	//Map the vertex buffer
	uvBuffer = glMapBufferRange(GL_ARRAY_BUFFER, uvBufferPosition, 8, GL_MAP_WRITE_BIT | GL_MAP_UNSYNCHRONIZED_BIT, null).asFloatBuffer();

	uvBuffer.put(0);
	uvBuffer.put(0);

	uvBuffer.put(0);
	uvBuffer.put(1);

	uvBuffer.put(0);
	uvBuffer.put(1);

	uvBuffer.put(1);
	uvBuffer.put(1);

	glUnmapBuffer(GL_ARRAY_BUFFER);

	//Add 4; because 4 vertex per quad
	vertexBufferPosition += 4;

	//Add 8; because 2 per vertex and there are 4 vertex per quad
	uvBufferPosition += 8;

	//Add 6; because of 6 indices per quad
	indexCount += 6;
}

Also is my double buffer idea is correct? I have never set one up before so I’m not sure if I got the implementation down

Riven · January 7, 2014, 8:47pm

Don’t map/unmap your buffer every time you add a quad. Map it once, and fill it with as much as fits in it.

noodleBowl · January 8, 2014, 3:04am

Yea, I woke up this morning and though about it was like ‘ooo you messed up’ haha

So like you said, I think I should really be doing this. The on thing I;'m not too sure about is the unmap part since there is no specific link to the buffer I want to unmap.
Maybe I’m missing something or maybe I have to interleave them for this to work?



//My main draw method
public void Render()
{
   batcher.beginRender(); //Sanity Checks if we are calling the right methods (Don't call endRender before begin and etc

//Queue the VBO with draw data!
   for(int i = 0; i < 6000; i++)
           batcher.draw(randX(), randY(), 32.0f, 32.0f);

//End the render; draw whatever we got in the VBO
   batcher.endRender();
}


/* All below apart of a Batcher Class; Has a Vertex Buffer and UV Buffer set to GL_STATIC_DRAW. Has a index buffer that is static and filled
from the start based on the max size of the vertex buffer. Lets assume its 10000 verts */

//My begin render method
public void beginRender()
{

   /*Sanity Checks */

   //Map the vertex buffer
   vertexBuffer = glMapBufferRange(GL_ARRAY_BUFFER, vertexBufferPosition, 12, GL_MAP_WRITE_BIT | GL_MAP_UNSYNCHRONIZED_BIT, null).asFloatBuffer();

   //Map the vertex buffer
   uvBuffer = glMapBufferRange(GL_ARRAY_BUFFER, uvBufferPosition, 8, GL_MAP_WRITE_BIT | GL_MAP_UNSYNCHRONIZED_BIT, null).asFloatBuffer();

}

//My Draw method, get everything to the screen
public void endRender()
{
   //Bind the texture
   glActiveTexture(GL_TEXTURE0);
   glBindTexture(GL_TEXTURE_2D, currentTexture);
   glUniform1i(texture, 0);

   //Enable the vertex buffer
   glEnableVertexAttribArray(0);
   glBindBuffer(GL_ARRAY_BUFFER, vertexBufferHandle);
   glVertexAttribPointer(0, 3, GL_FLOAT, false, 0, 0);

   //Enable the uv buffer
   glEnableVertexAttribArray(1);
   glBindBuffer(GL_ARRAY_BUFFER, uvBufferHandle);
   glVertexAttribPointer(1, 2, GL_FLOAT, false, 0, 0);

   //Draw everything needed based n index buffer
   glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, indexBufferHandle);
   GL11.glDrawElements(GL_TRIANGLES, indexCount, GL11.GL_UNSIGNED_INT, 0 );

   //Safety disable
   glDisableVertexAttribArray(0);
   glDisableVertexAttribArray(1);
}


//My put some data into the VBOs method; places data into the Vertex and UV vbo; Checks if orphans need making
public void addDrawData(float x, float y, float width, float height)
{
   //Check if we will be over the limit
   if(vertexBufferPosition + 12 > vertexBufferMax)
   {
       //Unmpa the buffers; called twice: one for the vertex buffer one for the uv buffer; May need to interleave?
      glUnmapBuffer(GL_ARRAY_BUFFER);
      glUnmapBuffer(GL_ARRAY_BUFFER);

      //End the render; draw stuff
      endRender();

      //Orphan the buffer
      glBufferData(GL_ARRAY_BUFFER, 0, GL_STREAM_DRAW);
      vertexBufferPosition = 0;
      indexCount = 0;

      //Re map the buffers
      vertexBuffer = glMapBufferRange(GL_ARRAY_BUFFER, vertexBufferPosition, 12, GL_MAP_WRITE_BIT | GL_MAP_UNSYNCHRONIZED_BIT, null).asFloatBuffer();

      uvBuffer = glMapBufferRange(GL_ARRAY_BUFFER, uvBufferPosition, 8, GL_MAP_WRITE_BIT | GL_MAP_UNSYNCHRONIZED_BIT, null).asFloatBuffer();
   }

   //Put the 4 verts (x, y, and z coords [ Z coords defaults to 1.0f ]) for a quad
   //Vert1
   vertexBuffer.put(x); vertexBuffer.put(y); vertexBuffer.put(1.0f);
   
   //Vert 2
   vertexBuffer.put(x+width); vertexBuffer.put(y);vertexBuffer.put(1.0f);
  
   //Vert 3
   vertexBuffer.put(x + width); vertexBuffer.put(y + height); vertexBuffer.put(1.0f);

   //Vert 4
   vertexBuffer.put(x); vertexBuffer.put(y+height);vertexBuffer.put(1.0f);

   //Tex Coord for Vert 1
   uvBuffer.put(0); uvBuffer.put(0);

   //Tex Coord for Vert 2
   uvBuffer.put(0); uvBuffer.put(1);

   //Tex Coord for Vert 3
   uvBuffer.put(0); uvBuffer.put(1);

   //Tex Coord for Vert 4
   uvBuffer.put(1); uvBuffer.put(1);

   //Add 12; because 4 vertex (With X Y Z coords) per quad 
   vertexBufferPosition += 12;

   //Add 8; because 2 per vertex and there are 4 vertex per quad
   uvBufferPosition += 8;

   //Add 6; because of 6 indices per quad
   indexCount += 6;
}

Riven · January 8, 2014, 6:25am

I was under the impression that you had lots of experience under your belt, given that you came from C/DirectX.

There are quite a few conceptual errors in your latest code dump (like pushing more texcoords than vertices, mapping without binding specific VBOs, combining unsynced mapping and orphaning, never writing indices in your index buffer, interleaving vertex attributes in overlapping mapped buffers, while writing into each sequentially, using offets when using distinct VBOs for your attributes, mapping teeny tiny ranges, not even enough to hold one quad). I’d advise you to go all the way back to Vertex Arrays (not VBOs, not mapping them, and not even close to unsynced mapped VBOs) and get it working from there (starting with 1 triangle or quad) – then work your way up again, if the need arises.

noodleBowl · January 9, 2014, 12:20am

Riven:

I was under the impression that you had lots of experience under your belt, given that you came from C/DirectX.

There are quite a few conceptual errors in your latest code dump (like pushing more texcoords than vertices, mapping without binding specific VBOs, combining unsynced mapping and orphaning, never writing indices in your index buffer, interleaving vertex attributes in overlapping mapped buffers, while writing into each sequentially, using offets when using distinct VBOs for your attributes, mapping teeny tiny ranges, not even enough to hold one quad). I’d advise you to go all the way back to Vertex Arrays (not VBOs, not mapping them, and not even close to unsynced mapped VBOs) and get it working from there (starting with 1 triangle or quad) – then work your way up again, if the need arises.

Coming from C++ / DirectX has no true ground when dealing with openGL. Its a different API, things are handled differently. Yes, concepts are the same but implementation is another thing

Pushing more Text coords
This is my fault, when I think of Vertex I think of the Vertex Struct I can do in C++/DirectX. Since I can’t do this in java/openGL, then you are right I am missing 8 more puts (The remaining X/Y coords and the Z coord)

Should be fixed in the above code

Mapping without binding specific VBOs
Maybe I am not thinking of this right, but I would do this in the constructor / init of the Batcher. Please correct me if I’m wrong, but I can/only need to do this (Bind the specific buffers) once right? Or does this need to be done multiple times? Also am I not allowed to bind multiple buffers at once? Again maybe interleaving is the solution to this

Combining unsynced map and orphaning
I’m not too sure about this one, as this topic is about implementing orphaning or a double buffer.
I appreciate the time you’re taking to explain this, unfortunately I guess things are not clicking

Not filling the index buffer
The code is just meant to show the three methods (endRender, beginRender, addDrawData) and how they work together. My idea was to have the index buffer filled once the Batcher class was created. There is no need in my mind to have this refilled with the same data value each time, since there is a max size for each buffer

interleaving vertex attributes in overlapping mapped buffers, while writing into each sequentially
I don’t think I do this? I mentioned that maybe I need to do this because of the mapping call and use of UV coords

using offets when using distinct VBOs for your attributes
I don’t think interleaving? No need for offset until I do so right?

mapping teeny tiny ranges / not even enough to hold one quad
I’m confused about this one, could you explain? My comments say that the vertex buffer would hold 10000 verts and the UV buffer would hold the required amount. Maybe this was in relation to the first issue

Now with that being said, I do see one thing that is wrong, and that is the glMapRange length param usage (pertaining to the latest code). What if I don’t know how much I’m going to map at a given time? What if I want to map until I say so (eg because I have ran out fo room or I have to swap a texture)? How is a undefined length done?

That kinda kills the idea of mapping a range though, so would I then use just glMapBuffer? Whats the difference in terms of performance?