display lists and texture size: unexpectedly slow

EdwinOlson · May 25, 2005, 11:44pm

Howdy all!

I’m animating a 900-quad sphere with a texture map over it (it’s “Earth”), and I’m trying to get the best performance possible.

I assumed that passing in the vertices/normals/texcoords each render pass would kill me with the JNI overhead, and that using a display list would improve performance. I was half right–

Display lists are faster for textures smaller than 512x512, and dramatically slower for larger textures. Here’s some data I collected on my x86_64 linux system with a GF 6600:

`
No sphere: 390 fps

… list no-list
GLU sphere (untextured): 370 165
My sphere (texture=128x128): 320 189
My sphere (texture=256x256): 260 180
My sphere (texture=512x512): 113 190
My sphere (texture=1024x1024): 40 184
My sphere (texture=2048x1024): 17.8 180
`

Note how performance drops precipitously as the texture size increases, but the no-list implementation performance is basically constant. I don’t understand this: why does this happen?

For comparison, I’ve plotted results for my scene without the sphere (consisting of a couple GLU-generated cylinders), and results substituting my hand-tesselated and textured sphere with a untextured GLU sphere of similar density.

I see something similar on an x86 linux system with a GF4MX420.

kbr · May 26, 2005, 1:10am

You should probably go to developer.nvidia.com and read the white papers on performance optimization and the tools they offer. We should probably add a ProfileGL to JOGL which times each individual OpenGL call. I can’t explain the huge difference in performance between display lists and immediate mode in your application, but if you only have that small number of triangles and one texture (even that large), something pathological is going on. I would recommend using vertex arrays or vertex buffer objects regardless as display lists can be hard to manage and vertex arrays support dynamic geometry. You should probably also generate mipmaps for your texture if you aren’t already. Look at the Grand Canyon demo’s source code and data files; it routinely renders 4+ million tris/sec with a fairly large memory-mapped set of textures. However note that it generates mipmaps ahead of time and memory maps in the right level of detail for each tile at run time.

tom · May 26, 2005, 10:35am

Just a shot in the dark: Make sure you don’t upload the texture in the display list, only bind it.

EdwinOlson · May 26, 2005, 11:47am

Do you ever get the urge to put a bucket over your head?

I was reuploading the texture during the display list. In my defense, I have code which lazily uploads textures when a bind is requested. But, when I created the display list, it was the first time I used that texture, so the texture was allocated during the display list. So it was just a eensy bit more subtle than : beginDisplayList() uploadTexture() …

I’m at 350 fps with my large texture now, thanks!

OpenGL newbie now looking for a bucket…

-Ed

Markus_Persson · May 26, 2005, 11:54am

!

I think I’ve got the exact same bug in the static model loader code in Wurm!

Mind if I borrow that bucket?

Mustan9 · May 26, 2005, 11:57am

[quote]I would recommend using vertex arrays or vertex buffer objects regardless as display lists can be hard to manage and vertex arrays support dynamic geometry.
[/quote]
Why this recommendation?

Displaylists are faster then buffers since the data is stored on the video card, and not re-sent on every frame.

A buffer is faster then sending each triangle via several OpenGL call, but cards which don’t support display lists most likely aren’t fast enough that buffers would make much of a difference.

Here is a question, which is off topic of textures:

What is the performance hit of the act of compiling a display list. Would it make sense to compile objects when they enter the view frustrum, and free the compiled list when the objects leave the frustrum. Only to have to re-compile them again later.

kbr · May 26, 2005, 2:27pm

Vertex arrays are already pretty fast and extensions like ARB_vertex_buffer_object with the appropriate hints should be exactly the same speed as display lists with appropriate driver support. The advantage, in my opinion, is that you get to treat all of the data in your application uniformly, and can easily support dynamic meshes.

robnugent · May 26, 2005, 2:31pm

I’ve got a similar app that does terrain rendering for parts of the British Isles.

All my vertex data is stored in VBOs down on the card (I’m using GL_STATIC_DRAW_ARB when setting up each VBO) along with the texture data.

Very little flows across the Java<->JNI<->AGP<->Video Card boundary to render each frame, just a
request to bind each texture in turn and render each relevant VBO, and not any of the vertex or texture data itself.

I can fit about 1.2 million vertices worth of vertex data plus 100 1024x1024 compressed textures on a 128MB Nvidia FX5200.

It renders at around 23 frames/sec (~20million tri/sec) at 1024x768x32bit, which is pretty much entirely limited by the card.

Rob

tom · May 26, 2005, 3:29pm

Compiling a display list tages ages. You never want to do this in the game loop as it WILL cause stutter.