Display Lists

eMKa · August 11, 2006, 2:59pm

From what I’ve found after code investigation, they’re used nowhere. Instead, vertex arrays all over the place. But for static geometry, display lists are much much faster (2x faster for as 15k geometry on my Geforce MX 440).

I wondered what would be the best design approach to permit use of Display lists in Xith3D.

I considered creating a StaticShape3D object which has its own atom peer in the renderers and which systematically uses display lists… Or maybe just a flag in the regular Shape3D…

What do you think ?

Riven · August 11, 2006, 3:27pm

VBOs should have the same performance as (if not better than) DisplayLists.

Are you sure your VAs are VBOs ?

These days DisplayLists are mainly useful to batch state-change related commands.

Qudus · August 11, 2006, 6:36pm

[quote="<MagicSpark.org [ BlueSky ]>,post:1,topic:28121"]
I considered creating a StaticShape3D object which has its own atom peer in the renderers and which systematically uses display lists… Or maybe just a flag in the regular Shape3D…

What do you think ?
[/quote]
I would prefer the flag version, which should offer a constructor to set the flag (besides the getter/setter).

@Riven: What are VAs?

eMKa · August 11, 2006, 7:28pm

@Qudus : VA = Vertex Array, VBO = Vertex Buffer Object.

Ahm not so sure, how can I know ?

Niwak · August 12, 2006, 6:33am

From my benchmark, display list offers the same performances than VBO. But this is only true if you setup “optimal” VBO which in my case where single VBO with packed data and indexed geometry. Additionnally NVidia drivers performs basic bounds checking when rendering display list.

Adding display list support seems to me as a good idea since display list lets the choice of the vertex format to the video driver hence you will allways get the optimal one.

If the main reason you are thinking on implementing display lists is because of the thread of the benchmarks of the Quake3 viexer, I think this is not the best way to go. I’ve done a bit of profiling on this benchmark to learn the good and bad design idea beyond the 3 3D engine. Xith3D is the slowest because the cost of rendering a node is rather high because ;

the vertex data format is the worst that you can choose (separate data arrays, no compact vertex format),
there is very few state caching in Xith3D (I have posted a GLInterceptor log showing this) and hence lots of redundant OpenGL call,
Xith3D recompute nearly everything for each rendering, there is nearly no frame coherence caching (only compiled render atoms),
Xith3D allways uses UNSIGNED_INT for the geometry indices.
```
   Vincent
```

eMKa · August 12, 2006, 9:44am

Niwak:

From my benchmark, display list offers the same performances than VBO. But this is only true if you setup “optimal” VBO which in my case where single VBO with packed data and indexed geometry. Additionnally NVidia drivers performs basic bounds checking when rendering display list.

Adding display list support seems to me as a good idea since display list lets the choice of the vertex format to the video driver hence you will allways get the optimal one.

If the main reason you are thinking on implementing display lists is because of the thread of the benchmarks of the Quake3 viexer, I think this is not the best way to go. I’ve done a bit of profiling on this benchmark to learn the good and bad design idea beyond the 3 3D engine. Xith3D is the slowest because the cost of rendering a node is rather high because ;
the vertex data format is the worst that you can choose (separate data arrays, no compact vertex format),

there is very few state caching in Xith3D (I have posted a GLInterceptor log showing this) and hence lots of redundant OpenGL call,

Xith3D recompute nearly everything for each rendering, there is nearly no frame coherence caching (only compiled render atoms),
Xith3D allways uses UNSIGNED_INT for the geometry indices.
   Vincent

Niwak your posts are always very interesting : I feel like you took a deep look to different engines and it’s very instructive.
The main reason I am thinking on implementing display lists wasn’t because of the threads about the Quake3 viewer. It’s because I have a big terrain with lots of polygon which never ever gets changed during the gameplay, and wether I display it or not obviously impact a lot on the FPS… That’s why I thought display lists would have much better performance.
I’ll try to understand your four points here :

which vertex format do you think would be more efficient ? How would you do a “compact vertex format” ? All in a single float[] ?
how to implement that efficiently ? Or, in other words, how should we sort the calls ?
what exactly gets recomputed each frame ?
what’s the problem with that.
I think you to take time to shed some light on this.
BTW, are you working on a new engine ? How is it going ? And how do you handle these issues ?

Niwak · August 22, 2006, 5:27pm

[quote="<MagicSpark.org [ BlueSky ]>,post:6,topic:28121"]

which vertex format do you think would be more efficient ? How would you do a “compact vertex format” ? All in a single float[] ?
[/quote]
There are lots of information on NVidia and ATI informations developer web sites about performance issue. For vertex format, it is advised to use ‘compact’ vertex format, that is to said that all you array pointers points to the same memory block of the VRAM. That means that you need to pack all your datas into a single VBO and use array pointer with a non 0 stride and offset. It is also advised to align your data eventually padding them. I never tried this last bit. There are lots of ways to implement it and there is not such a thing than a universal geometry class among the different 3d engine.

[quote="<MagicSpark.org [ BlueSky ]>,post:6,topic:28121"]

how to implement that efficiently ? Or, in other words, how should we sort the calls ?
[/quote]
I don’t understand what you mean by ‘that’. If it is the vertex format, I did not found any perfect answer in any engine. For my own engine, I have choose to let the programmer creates tuple array which references memory blocks that holds the data in NIO buffers. This means that data management is left to the engine user. This may not sound satisfying but It really fits well with the design of my engine (see below).

[quote="<MagicSpark.org [ BlueSky ]>,post:6,topic:28121"]

what exactly gets recomputed each frame ?
[/quote]
The scenegraph is traversed for each frame. Each node is tested for culling, each node goes through all the renderNode / getAtom methods of the View class. The cache system caches render atom with some sort of lazy updating (which is rather buggy). In the Quake3 benchmark, Xith3D spends most of the time traversing the scenegraph. With a system that only works on modification, you just don’t spend a single millisecond traversing nodes that did not change. For the (limited) understanding of Java3D that I have, this is one of the main difference ; Java3D creates a clone of the node which is updated when its user node sends change events.

[quote="<MagicSpark.org [ BlueSky ]>,post:6,topic:28121"]
BTW, are you working on a new engine ? How is it going ? And how do you handle these issues ?
[/quote]
I am working on my own engine and it is going rather well. The development is not as fast as I would like but my first child is born a few month ago, I have switched job and I’am moving into a new flat, so it’s a bit more difficult to keep up with my hobby projects…

Regarding the design of my engine. I tried something fairly different from the engines I used (CristalSpace then OpenSceneGraph, then Java3D, then Xith3D).
The first point is that I discovered that there is no perfect design for an engine ; depending on the project I had, I prefered a very high level engine with medium performance, or one tailored to top-down view or another one that handles shadows with a technique that was ok for me,…
Therefore, the main idea of my engine is just to be a scenegraph framework ; the scenegraph is composed of nodes implementing the INode interface that’s all. The INode interface is very minimalist ; a node has an optional name (get/set), a parent INode (get/set), extension points and can have listeners for change on these fields. That’s all.

The second idea is to introduce ‘interpreters’ which as you can guess interprets the scenegraph. Example of these interpreters are bounds interpreters (computes and update bounds), environment interpreter (maintain a list of each node implementing the IEnvironmentNode interface that influence a node and updates the list of influenced nodes that each IEnvironmentNode maintains, example of this are light and fog), graph interpreter (provides a way to traverse the scenegraph), transform interpreter (maintain a matrix stack of the transform), scene interpreter (define what compose a universe)… These interpreters are completely isolated from the scenegraph. Most interpreters are built using plugins this allows to extend the scenegraph very easily ; create a node, create a plugin for the interpreters you need, your done !

Renderers are built upon this system. The core video renderer defines its own interpreter which keeps the render frame up-to-date. Only change events are processed.
Culling is implemented using a culling system object which is in charge of sending the enter/exit render frame events.

So in short the general idea is ;

either use the core nodes and be satisfied with the medium range performance you will get,
or define nodes adapted to your application that will allows you to reach better performances.

So far it works very well. I have converted a few of my own (small) projects to test the engine and it works well. The point that I did customize for nearly each project where the culling system (it really depends a huge lot on your project) and the appearance system.

Regarding the state of the engine (you asked in another thread if I would release it, open sourcing it or not). This engine is not meant to become commercial ; I do not have the time not the competence for this. I will eventually release it as an open source project but I don’t think I have matured it enough for this. There is a really big difference between something that seems a good design and a working design with a few full applications that prove it.

Ouch, that was a long answer. Hope it is what you expected.

Bye

                     Vincent

eMKa · August 29, 2006, 1:31pm

Well, pretty much, but it’s a bit despairing as well…

I don’t have much knowledge about how memory is allocated either into RAM or VRAM and I didn’t know it impacted that much on performance… If I recall correctly, in OpenGL you specify anyway separately vertex data, color data, UV data, and so on, am I wrong ?
Which bugs have you spotted in the atom cache system of Xith3D ?
In your opinion, is it worth working on Xith3D to fix/optimize it or to work on a new engine ?

darkprophet · August 29, 2006, 1:44pm

Not meaning to rain on your paraide, but:

Taken from a recent post from a senior nvidia guy.

Display lists are, and will be for the forseable future, the fastest way to render static geometry. I do agree with you that a properly created VBO is nearly as fast as a DisplayList, but that takes alot of fiddling to get the right format for a particular card. Not mentioning the overhead of a JNI call, and then the overhead of a method call in C/C++…

DP

Niwak · August 29, 2006, 5:07pm

[quote="<MagicSpark.org [ BlueSky ]>,post:8,topic:28121"]

I don’t have much knowledge about how memory is allocated either into RAM or VRAM and I didn’t know it impacted that much on performance… If I recall correctly, in OpenGL you specify anyway separately vertex data, color data, UV data, and so on, am I wrong ?
[/quote]
You specify data separately but you can source them from the same VBO and you can interleave the data in order to have better memory access coherency.

[quote="<MagicSpark.org [ BlueSky ]>,post:8,topic:28121"]

Which bugs have you spotted in the atom cache system of Xith3D ?
[/quote]
Honestly I don’t remember precisely the case were it failed. I still remember that in some situation modifying the appearance properties did not cause the shape atom to be updated.

[quote="<MagicSpark.org [ BlueSky ]>,post:8,topic:28121"]

In your opinion, is it worth working on Xith3D to fix/optimize it or to work on a new engine ?
[/quote]
It depends on your needs. I preferred starting on a clean engine, with my own design and no compatibility issues.

I do agree with you. In my post I state that implementing display list in Xith seems to me as a good idea. In my first post I was just saying that I don’t think display lists will be the solution to the fact that Xith3D has poor result with the Quake3 benchmark posted in these forums some time ago.

eMKa · August 30, 2006, 8:35am

Oh yeah this has been fixed a while ago.

That was not my goal.