… which is why Nvidia and AMD spend lots of money supporting game development. Ever seen “The Way It’s Meant To Be Played” and Nvidia’s logo when starting a game? That’s because Nvidia have helped them out, and obviously added some optimizations specific for their Geforce cards. Game makers obviously want to get even performance from both Nvidia and AMD cards, but those companies are competing and obviously want to look faster than the other.
while writing my bachelor thesis about CUDA I also learned a lot about how GPUs work.
And for every one who is interested in a more in depth view how the GPU works I really recommend to take a look at the CUDA documentation from NVIDIA which describes the architecture in a really easy way.
I have continued my experiments, this time trying to see the relative effects of retained mode versus immediate mode. I built a vertex buffer of 20k vertexes and drew it using either retained or immediate mode. I did it in a cycle increasing the number of times I drew the vertexes each cycle by the square of the iteration. To exclude pixel rendering as an issue I set the camera to be very distant from the scene (thus reducing the number of pixels being drawn).
Anyway, here are my results, if you’ve ever wondered if there is much difference between these two modes. (My loops terminate when the frame rate goes below 40).
20k Retained Immediate
1 : 274 244
4 : 246 193
9 : 227 166
16 : 179 128
25 : 138 93
36 : 110 65
49 : 93 49
64 : 78 39
81 : 64 -
100 : 55 -
121 : 47 -
144 : 40 -
Interestingly, I saw a much smaller difference when my vertex buffer was only 10k vertexes.
The overhead of immediate mode can be offset by having a good CPU. A less balanced computer (or better balanced for gaming?) might have a better GPU and a worse CPU, so it might suffer a lot more from immediate mode (assuming you are comparing glBegin()/glEnd() to glDrawArrays()). Loading stuff into a vertex buffer can also be done on multiple cores, something that is impossible with immediate mode. All in all there’s no reason to use more CPU power than you need since your game most likely can make use of any spare cycles for AI, physics, more entities, etc.
Heh, I had an AMD Athlon dual core paired with a GTX 295 for a few months… ;D
Sorry, I’ve got my terminology a bit mixed up - I was using glDrawArrays for both modes. The only difference was whether the buffer was a VBO located on the graphics card or it was a FloatBuffer located in the PC’s RAM. I am sure if I did true retained mode with lots of calls to glBegin/End there would have been an even larger difference. It would be way too much bother to change my current test code to compare VBOs to glBegin/End code, although I will update the thread when I have a new test showing PC RAM buffers vs VBOs vs Display Lists vs glBegin/End. I’ve heard display lists are even faster than VBOs so it would interesting to test this out, and if I’m building display lists then I can easily test raw retained mode too.
My PC is definitely CPU limited - a Pentium Dual Core 1.8GHz matched with a GT430. The PC might be a bit slow, but I got it for 15 euros! I had to buy the graphics card separately and it cost three times as much as the PC.
I’m writing these tests because it is not immediately apparent whether this or that change in code results in a performance improvement. As a noob it is really easy to write something that looks OK but actually degrades the performance.
Ah, okay. There’s a pretty good reason why immediate mode and the gl******Pointer functions that take a Buffer object were removed in OpenGL 3.2 (or was it 3.1?). Even if you use VBOs and update them each frame it should be faster if you use glMapBuffer(). You should try that, it’s pretty simple. It should pretty much be the fastest way of doing it. Theroetically your program shouldn’t be bottlenecked by the memory transfer since it should happen in parallel to the rendering, though such an optimal case where the copy is 100% is of course rare…
Display lists were deprecated too! VBOs should be just as fast for rendering stuff, but I do agree that display lists have some use since you can also store state change commands, though that doesn’t mean that they are free or should be called more often than without display lists. Try it out, the driver is usually able to optimize the data a lot for display lists. A thing to note is that texture binds are NOT stored in the display list, though this might be a driver bug (not likely though) since it’s written in the OpenGL specs that they should be stored.
Technically there is no such thing as “retained mode” anymore. That was a feature from Iris GL, DirectX didn’t maintain it after its first public release (and dropped it in DX10) and OpenGL never had it. Obviously, using vertex arrays blurs the distinction somewhat, so most people know what you mean, but if you use the term in a DX forum, you’ll get a lot of sideways looks from people wondering why you’re using such an ancient deprecated API.