Immediate Mode Versus Vertex Array Performance Confusion

Vladiedoo · February 26, 2013, 5:46am

Hi, I’m trying to advance my OpenGL knowledge past immediate mode. From Riven’s (thanks Riven) tutorial I was able to get a vertex array rendering properly but I was not able to see an increase in performance.
Here is how I tested it, if this is the wrong approach to bench-marking please inform me.

To test the performance of immediate mode and vertex arrays I drew a quad repeated, in this test it was 88573 quads. When I one big vertex array hold all 88573 quads it got close to the performance of immediate mode but not quite there.
Pastebin: http://pastebin.java-gaming.org/3dddd044049 (It’s messy)

Here is sample console output:
(I should have labeled these). The first columns is individual vertex arrays, the second columns is immediate mode, and the third columns is one big vertex array.

67900774	13660101	15001607	n: 88573
371488407	14153266	19133484	n: 88573
116280710	13882860	14012185	n: 88573
68273896	13738066	15345957	n: 88573
71455953	15008723	12876111	n: 88573

Q1: Could someone please explain why this occurred?
Q2: Does it matter how I render (immediate, vertex array, VBO) when I decide to fully use shaders?

Sorry, these questions must have been asked a lot, thank you for your time.

EDIT: This was done in 3D if that makes a significant difference.

StumpyStrust · February 26, 2013, 6:58am

You will probably get a guy that goes by something agent that will inevitably clear all things up but I will take a whack at it.

What you are trying to do is basically render as many sprites (textured quads) as possible. When doing this with VBO and VA, it is better to batch them in groups of 1500-3000 sprites then one giant VBO/VA that holds all 88573. Also, it is better to use triangle strips to form the quads then actual quads. Little faster. It is generally not a good idea to recreate your buffers every render which can slow thins down alot.

I think something must be off here as VA should vastly out perform fixed function (immediate mode). I made a sprite batcher tut on here which uses very simple VA that you could try as I know that works. Also, davedes as some great tutorials on modern opengl along with a sprite batcher tutorial which is the main thing you need for 2d games. Once you get a VBO sprite batcher going look into rivens post about a blazingly fast method for sprite batching.

As for when you use shaders, you can do it with all of them but it is better to not use depreciated methods i.e fixed-function/vertex array.

I hope this helped.

princec · February 26, 2013, 9:38am

Microbenchmarking will never give you useful results really.

Cas

theagentd · February 26, 2013, 11:36am

You’re benchmark is flawed in so many ways that the numbers don’t mean anything.

Microbencharking, like Cas said. Plus, OpenGL doesn’t execute commands right when you call them. It queues them up and then periodically (or when the queue’s full) sends them off to the GPU. That means that the command that causes the queue to become full is the one that’ll take time, which will most likely not be the one that actually takes time to execute.
You’re not using vertex arrays in a very good way, and you’d probably benefit a lot from using VBOs instead. First of all, the whole point of vertex arrays is the ability to batch up geometry data so you can render them with fewer OpenGL commands. The way you’re using them now, you’re barely gaining anything since you’re not using less calls, hence your first vertex array method is slower than immediate mode. Your batched version is almost as fast as immediate mode but it’s not very efficient. You should at least reuse the FloatBuffers between frames, not create new ones every time. If the data doesn’t change between frames, just fill the FloatBuffers once and reuse the same data too. As mentioned, if you want even better performance, take a look at VBOs.
This is kind of hypothetical, but how big are those quads? If each quad is 100x100 pixels on the screen, one quad would cover 10 000 pixels. 88 573 quads would therefore cover 885 730 000 pixels each frame, or 53 143 800 000 pixels per second at 60 FPS, which obviously might be a bottleneck. How you send the coordinates doesn’t affect how the pixels are filled, so if you hit a bottleneck there you won’t see much of an improvement no matter what you do.