How important is it to minimize material and lighting changes?

kitfox · October 8, 2005, 3:45pm

After reading the Red Book, I got the impression that changing materials and lights is an expensive operation, and that it would be worth while to optimize my scene graph to ensure that as few calls to glLight(), glMaterial() and glTexParameter() are made as possible. Intuitively, however, considering the large number of different lights an materials that one finds in a typical scene, these calls can’t be that expensive.

Does it make sense for me to implement a sorting algorithm on my final rendering queue to place objects with like materials and lights adjacent to each other (and so minimize the need to flick lights on and off), or are these calls speedy enough that this will give me no real improvement in performance?

Mark McKay

anarchotron · October 8, 2005, 5:05pm

For complex scenes, clever sorting of your rendering actions is absolutely necessary and can result in immense performance gains.

I recommend that you sort first by ‘effect’, which it the entire set of states including materials, shaders, and most other states.

Then sort by texture. With multitexture this involves a little cleverness. A texture state with texture A & B on texture units 1 & 2 should sort at the same level as textures A & B on texture units 2 & 1, since no additional texture transfers are needed.

Then sort by vertexbuffer or array.

Then sort by lighting state.

However once you have your sorting chain in place it is pretty easy re-order your sorting to find the best performance for your scene. A scene with a few vertexbuffers and many textures might benefit from sorting vertexbuffer first.

kitfox · October 9, 2005, 5:09pm

Thanks for the advice. Why do you suppose ‘effect’ the most important thing to sort by? I’d have thought that since specular highlights, fog colors and such require so little data compared to things like texture maps that they’d be the least siginificant thing to sort by.

Also, just to be clear, by ‘sort by’, you mean ‘effect’ is the primary sorting criteria, texture is the secondary, etc, and not ‘put all your things to be rendered in a queue and first sort the queue for effects, then sort it for like textures, etc’. The second case would have a reverse sorting priority than the first.

Thanks,

Mark McKay

Niwak · October 10, 2005, 6:52am

The relative cost of the different state changes depends on the graphics hardware.
The general rule is the one given by anarchotron ;

texture (because, it may have to upload memory blocks),
vertex buffer (because, it may have to upload memory blocks),
lights, modelview transform,… (because, it causes a “validation”),

To let the thing be a bit more complex, you have to handle the front to back ordering as well to get the benefit of early pixel rejection of recent hardware (post GeForce 3).

I have made a few different attempts in my engine and it finally ended up with a two pass sorting algorithm ;

sort by state, using a quicksort algorithm (http://www.iti.fh-flensburg.de/lang/algorithmen/sortieren/quick/quicken.htm, don’t use Collections.sort which will do a copy of your render command array each time it is called),
then sort texture buckets by depth ; groups of commands using the same texture states are sorted according to the minimum depth of the group.

If you keep you render command list from frames to frames, you will have very few reordering to perform and the quicksort will execute very quickly (optimally in O(n)).

Finally, I will say that sorting commands gives a performance boost and therefore should be planned in the engine but there are over areas that give performance boosts as well and, I think, should be tackled first ;

using vertex array then vertex buffer gives huge performance boost,
memory management and interaction with the garbage collector allow better performance and will give steady frame rate which has a huge impact on the quality of animation,
culling (bounds-base, BSP,…) has a huge impact on universes where you only see one part at a time…
Therefore, Sorting is important but it may not be the first thing to tackle.
```
           Vincent
```

kitfox · October 10, 2005, 8:36pm

If I use a BSP tree to partition my scene, won’t this prevent me from sorting by texture, vertex bufffer, etc? Wouldn’t a BSP tree essentially turn my scene into an unrelated collection of polygons which are then rendered front to back in a reverse painter’s algorithm? It seems to me that this would prevent any other sort of sorting.

A few other questions:

How do you manage garbage collection to improve your performance? I thought Java hid that from you (aside from the System.gc() call).

If I do implement a BSP tree, should I create one huge buffer with all the verticies of the scene? Or should I keep each object with it’s own verticies?

In a BSP tree, am I right in thinking that first one should first render front to back with opaque gemetry, and then back to front with transparent geometry? Also, is it the Z buffer check that makes this faster than just rendering all the geometry without a BSP tree? Otherwise, I don’t see how drawing shapes in this order is faster than the naive way.

Would inserting objects into an octree using axis aligned bounding boxes be an acceptable alternative to a BSP?

Thanks for the advice.

Mark McKay

tom · October 10, 2005, 9:39pm

Why do you use a BSP tree? For culling? If so, you can use it to mark what is visible and render that with sorting after the BSP is traversed.

kitfox · October 10, 2005, 10:07pm

I’m not currently using a BSP tree, but have heard that they are quite useful in rendering 3D scenes. I’ve never been able to find a comprehensive decription of the BSP algorithm, so I’ve pieced together best as I can from various webpages. I’d like to use it primarily for culling, but might wish to use it for collision detection too.

The tree building algorithm described here sounds pretty straight forward:

ftp://ftp.sgi.com/other/bspfaq/faq/bspfaq.html#6.txt

However, it’s unclear how use of this tree allows a performance boost. While I can partition the scene and render polygons in a reverse painter’s algorithm, how does that let me know what’s not visible? If I render a triangle onscreen which completely occudes another triangle farther away, how can I determine this without reading the entire Z-buffer?

Also, it seems to me that the BSP algorithm essentially shreds your entire scene into it’s component polygons. At any node in the BSP tree, polygons are sorted into either infront of the partition plane, on the partition plane, or behind the partition plane. So the only sorting that can be done is in the very few cases that there are polygons that are coincident on the same partion plane. This would exclude the possibility of rending one piece of geomtry at the time with similar lights and textures, since all the polygons that make up the geometry could potentially be in a different partitioned volume.

anarchotron · October 11, 2005, 4:39am

BSP tree is useful for rendering with Painters Algorithm (not reverse) IF you are doing software rendering or other rendering where there is no depth buffer. In this case every polygon must be rendered back to front.

With depth buffer (as mentioned, depth buffer HW that supports quick depth rejection), you want to render your scene front to back so you can fill the depth buffer as quickly as possible and let the HW reject as much as possible.

In the modern case I would imagine BSP tree is mainly useful for culling and collision detection. You certainly don’t want to sort your entire scene per-polygon. The sorting overhead will be high, eating into any gains you get by grouping state, texture, & VBO changes.

Maybe you should consider some kind of portalized space instead of a BSP tree.

Niwak · October 11, 2005, 6:37am

The usual sequence in a rendering engine is the following ;

Application phase
Application modify scenegraph according to user input and application specific behavior.
Culling phase
The scenegraph is traversed, nodes that are (likely) visible are inserted to a “display list” (not an opengl display list).
The display list is then sorted.
Drawing phase
The display list is rendered.

A BSP, octreee, quadtree, portal engine, or whatever system is used in the culling phase to avoid having to traverse the complete scenegraph and therefore save time. The usual way to do this is to hve a separate data structure, let’s say a scene manager, that will hold the spatial coherence of your scene. This scene manager will be able to quickly mark which nodes are inside or outside its space subdivisions. This can be rather fast since the scenegraph is a hierarchical system where you can state that if a group bounding volume is entirely in a space subdivision then all its children are.

The state/depth sorting happens after this scenegraph traversal. Therefore the two optimizations are not linked.

It is important to note that these 2 optimizations serves different purposes ;

spatial culling mainly optimizes the CPU work (and eventually the GPU vertex processing phase),
state/depth sorting mainly optimizes the GPU work (and puts a little bit more stress on the CPU).

To finish with that, I would add that all space subdivision techniques are designed for big scenes where only a small part of the scene is visble at one time. If it is not your case, simple bound base culling is far enough and before diving in one of the space division techniques (which can be quite heavy work), I think you should profile your engine.

About your question on the garbage collector, the answer is yes and no… I mean ‘yes’ java manages memory for you but ‘no’ the default memory management provided does not suit the needs of realtime applications. The main problem is that realtime applications need to have a predictive execution time. The garbage collector breaks this since when it executes, it will slow done the speed of your system and its execution is not plannable. The consequence is that your animation looks choppy.

There are different ways to handle this and you will find quite a lot of messages on the subject in this forum. I did not find any perfect solutions but just using the concurrent garbage collector and lowering the amount of garbage generated was enough for my needs.

       Vincent

SeskaPeel · October 12, 2005, 8:02pm

The most important thing to do nowadays is to batch your DrawCalls.
Then, on nVidia boards (don’t know for ATi), the VBOs are the current under optimized batching.
Then, textures.
Materials, lighting, and other state related would cause minor stalls, so don’t bother with these.
An interesting approach is to cull on the fly because it optimizes a bit the CPU/GPU parrallelism.

And if you want to avoid to sort back to front for alpha blend, you can use the basic trick to render all your transparent geometry, unsorted, after your opaque one (wich is in all cases a nice STATE sort), and disabling Zwrites so that you can see behind transparent meshes even if they were rendered first. It causes artifacts if you have multiples layers of transparent geometry, except if all your layers have the same color.

SeskaPeel.