Immediate mode rendering is dead

VeaR · January 18, 2010, 6:09pm

“Immediate mode rendering is dead”

It’s dead long ago. Immediate OpenGL rendering mode was when OGL calls were used between glBegin() and glEnd() to render/specify each of the triangles vertices one-by-one. It was the original rendering mode of OpenGL (back in 1993, if i remember).

You should have given the title:

“Vertex array rendering mode is dead, along with display list rendering”

princec · January 18, 2010, 6:18pm

Indeed. Although at least up until now, vertex array rendering was fast enough for my purposes, but now it seems I’ve hit the absolute limits, and the drivers are definitely pushing everybody to use VBOs or face drastically shit performance.

This is kinda significant because a) all my legacy code has a lot of immediate mode in it (I don’t mean a lot of rendering, just a lot of places) and b) all the legacy tutorial code on the internets is immediate mode and it’s all completely the wrong thing to be teaching people now

Anyway, back to major hackery to convert all my immediate mode stuff into VBO rendering… without breaking any of my legacy code… whilst still being integrated into the sprite engine layering system… argh

Cas

DzzD · January 18, 2010, 7:03pm

hehe and any chance to have an imediate compiled displaylist bench ?

princec · January 18, 2010, 7:26pm

Display lists used to just crash / not work most of the time I tried them on crappy Intel cards anyway. So I never used them.

All this stuff is basically written in stone for OpenGL3.1 anyway, so we might as well get used to it and start doing it right from now on anyway ;D

Cas

DzzD · January 18, 2010, 7:31pm

yup yup, was just a request for my personal knowledge

xinaesthetic · January 19, 2010, 2:56am

I must admit, I recommended someone to use display lists on here just a few days ago, without warning them that are actually already deprecated (which I knew full well)… they were using the JOGL TextRenderer and I wasn’t really sure how that would interact. Well, I felt guilty so I’ve at least now let them know.

In my own code, I have VBOs for the major stuff, with some immediate mode, the odd display list and one or two vertex arrays… I don’t notice any particular difference to performance whether I have my various trivial immediate mode bits rendering or only VBO stuff - I am largely CPU bound, though. In fact, I’ve just been experimenting rendering particles in immediate mode vs VA vs VBO and am seeing absolutely negligible differences… if anything, immediate mode maybe slightly ahead. WTF?

princec · January 19, 2010, 9:49am

Maybe you have awesome drivers

I’m on an Nvidia 6510Go, with no video RAM - it’s got nice drivers but performance wise it’s barely any better than Intel rubbish.

Cas

xinaesthetic · January 19, 2010, 11:30am

Well, I’ve got the 190.89 drivers for an 8600M GT under 32bit Vista… certainly a bit better hardware wise

I’m still surprised that I fairly consistently see an increase of a few fps changing my particle system to immediate mode instead of VBO… never the other way around, it seems. Maybe I’m confused somewhere down the line.

jezek2 · January 19, 2010, 11:54am

I observe the same thing (at least on 6600GT). It seems that small dynamic geometry rendering is faster using immediate mode than using VBO. But I haven’t tried to eg. fill the VBO at start of frame, render other stuff and then use that VBO for rendering which would be more GPU/pipeline friendly.

princec · January 19, 2010, 11:58am

I can’t fathom any way a particle system could be faster using immediate mode rendering. Especially from Java. Unless you draw so few particles it amounts to a microbenchmark and that renders the results a bit suspect.

Cas

xinaesthetic · January 19, 2010, 1:10pm

I can’t fathom it either. It proved quite a distraction from other things I should be doing; it was nearly 3am by the time I made my post last night, by which time I was starting to doubt the validity of my judgement… I haven’t been totally rigorous but I can’t see a major flaw in my methodology, and now jezek2 seems to be finding the same.

I was running 10,000 particles in some of my tests, 30,000 in others. Either is enough to make particles easily the most expensive thing in the app; framerates around 30fps (much faster with particles off).

Interesting point about the pipeline… I’ve been planning to make some changes to the way my rendering works that would make implementing something along the lines of jezek2s suggestion very easy.

At some point, I might do my particle animation on the GPU instead… that really should be faster. Also, I’ve certainly seen big gains going away from immediate mode in other parts of the program.

jezek2 · January 19, 2010, 1:21pm

For the record, I’m using it for quite small number of vertices (much smaller than the amount xinaesthetic mentioned), also it’s not primarily for particles though I currently render the few particles using immediate mode too.

Riven · January 19, 2010, 1:27pm

Vertex arrays: 17ms!!! (glMapBuffer was 30ms)


                javaSideBuffer.clear();

               FloatBuffer fb = javaSideBuffer.order(ByteOrder.nativeOrder()).asFloatBuffer();

               FloatBuffer vBuffer = (FloatBuffer) fb.slice().limit(fb.capacity() >> 1);
               FloatBuffer cBuffer = (FloatBuffer) fb.slice().position(fb.capacity() >> 1);

               for (int i = 0; i < 256; i++)
               {
                  glVertexPointer(3, 0, vBuffer);
                  glColorPointer(3, 0, cBuffer);
                  glDrawArrays(GL_TRIANGLES, 0, quadCount * 3 * 2);
               }

Demonpants · January 19, 2010, 3:51pm

Riven:

Vertex arrays: 17ms!!! (glMapBuffer was 30ms)


                javaSideBuffer.clear();

               FloatBuffer fb = javaSideBuffer.order(ByteOrder.nativeOrder()).asFloatBuffer();

               FloatBuffer vBuffer = (FloatBuffer) fb.slice().limit(fb.capacity() >> 1);
               FloatBuffer cBuffer = (FloatBuffer) fb.slice().position(fb.capacity() >> 1);

               for (int i = 0; i < 256; i++)
               {
                  glVertexPointer(3, 0, vBuffer);
                  glColorPointer(3, 0, cBuffer);
                  glDrawArrays(GL_TRIANGLES, 0, quadCount * 3 * 2);
               }

It is recommended on OpenGL’s site that you use glDrawArrays instead of immediate mode, so I’m not at all surprised.

princec · January 19, 2010, 4:39pm

Err… you should be surprised, because using plain old fashioned vertex arrays, he’s got 2x the performance of VBOs. Which is exactly the opposite of what just happened to me in my sprite engine. Admittedly Riven’s code is pretty much a microbenchmark and my sprite engine is a real-world application doing real things, so possibly my results are more relevant, though I need to test on the Mac, ATI cards, Intel cards, and various PC configurations before I draw any firm conclusions.

Cas

DzzD · January 19, 2010, 5:02pm

Riven:

Vertex arrays: 17ms!!! (glMapBuffer was 30ms)


                javaSideBuffer.clear();

               FloatBuffer fb = javaSideBuffer.order(ByteOrder.nativeOrder()).asFloatBuffer();

               FloatBuffer vBuffer = (FloatBuffer) fb.slice().limit(fb.capacity() >> 1);
               FloatBuffer cBuffer = (FloatBuffer) fb.slice().position(fb.capacity() >> 1);

               for (int i = 0; i < 256; i++)
               {
                  glVertexPointer(3, 0, vBuffer);
                  glColorPointer(3, 0, cBuffer);
                  glDrawArrays(GL_TRIANGLES, 0, quadCount * 3 * 2);
               }

hehe, not really surprised I woud be also pretty confident with a compiled list (inded without redundant glbegin)

Spasi · January 19, 2010, 5:10pm

Riven, could you try to submit 10 times (or more) the vertex data per iteration and post another comparison?

Btw, we can’t really compare display lists here, this is rendering of dynamic geometry submitted to the GPU each frame (each iteration in Riven’s code represents a rendered frame).

DzzD · January 19, 2010, 5:13pm

yes, that why I suppose in case of static geometrie displaylist would have be faster (a way faster as only one JNI call for thousands of draws), so everything have it own usage depending on application. nb: also mixing different displaylist (and recompil them on the fly) , reorder them too, can work pretty weel for dynamic scene rendering

but the comparaison can be made in a certain manner by drawing the same thing to the screen

Riven · January 19, 2010, 5:26pm

(16162 = 512 tris) * 256 iterations
VA (normal): 6ms
VBO (mapped): 14ms <=
VBO (subdata): 14ms <=

(32322 = 2K tris) * 256 iterations
VA (normal): 17ms
VBO (mapped): 30ms
VBO (subdata): 51ms <=

(64642 = 8K tris) * 256 iterations
VA (normal): 65ms
VBO (mapped): 182ms <=
VBO (subdata): 122ms

(1281282 = 32K tris) * 256 iterations
VA (normal): 415ms
VBO (mapped): 1225ms <=
VBO (subdata): 762ms

it depends on the datasize which is the slowest :persecutioncomplex:

Demonpants · January 19, 2010, 5:36pm

Yeah but you’re never ever supposed to use immediate mode anymore. So if you’re using a combination of VBOs and immediate mode then it makes sense to be slower than glDrawArrays or glDrawElements.