Performance & Benchmarks

Just wanted to weigh in regarding performance of Xith3d. I tweaked a few things to increase polygon density in a scene and took a snapshot. This scene is rendering 162K polygons a frame at 28 FPS at 1200x1000. Even this is not technically “real world” because “heavy polygon” models will render extremely fast. In other words even your friends commercial engine would be hard pressed to render 162 objects of 1k each, especially if they had different textures. In this scene I upped the tesselation level of objects (which of course is foolish to do normally) so that the ground and avatar and shield is rendering with a lot more polygons than is necessary.

Keep in mind that Xith3D has not been as optimized for performance as it will be in the future.

One other thing to keep in mind is that if you are starting a project now, it will probably be at least a year until you actually statr caring about performance. In other words for a new game you have a tremendous amount of work to do which is unrelated to the rendering engine itself. It is doubtful the rendering engine would be your bottleneck for many months to come. By that time Xith3D will be mature and ready to go head to head with those commercial engines out there.

Darn look like our website is down! well you can check the screenshot when it is back up.

[quote]Just wanted to weigh in regarding performance of Xith3d. I tweaked a few things to increase polygon density in a scene and took a snapshot. This scene is rendering 162K polygons a frame at 28 FPS at 1200x1000. Even this is not technically “real world” because “heavy polygon” models will render extremely fast. In other words even your friends commercial engine would be hard pressed to render 162 objects of 1k each, especially if they had different textures.
[/quote]
Thanks for your weigh in. Reads very will and it gives me good confidence for my small project. :slight_smile:
Btw my friend intends to create about 100 objects with a total of 50k polygons. Around 50% of these objects will have got alpha transparency too (so I’ll use Xith’s internal sorting for such transparency objects a lot).

[quote]Keep in mind that Xith3D has not been as optimized for performance as it will be in the future.
[/quote]
Yes I learned this already. Yuri (and others) explained that niceley some days ago.

[quote]One other thing to keep in mind is that if you are starting a project now, it will probably be at least a year until you actually statr caring about performance. In other words for a new game you have a tremendous amount of work to do which is unrelated to the rendering engine itself. It is doubtful the rendering engine would be your bottleneck for many months to come.
[/quote]
Absolutely.
My game will start as a hobby project with an aimed shedule of half a year. Well, we’ll see. “It’s done when it’s finished” comes to my mind.

[quote]By that time Xith3D will be mature and ready to go head to head with those commercial engines out there.
[/quote]
Sounds pretty good. It’s like David versus Goliath. And we all know David won. :wink:

PS: The snapshot I’ll have a look at when the server’s ready.

Ok the screenshot is up.

flipin’ eck guvnor, that looks a bit good!

Kev

/me always wanted to work on Magicosm …
Sigh

Hey Dave!

Looks awesome.

Up on Mac yet? :slight_smile: :slight_smile: :slight_smile:

that is most impressive. I the GUI fits in so well with the game, very slick.

Will.

162k polygons on screen at once at 28 fps (1152 x 864 Pixel) with Xith3d and Jogl in a real world example…
Well, that is impressive.

[quote]162k polygons on screen at once at 28 fps (1152 x 864 Pixel) with Xith3d and Jogl in a real world example…
Well, that is impressive.
[/quote]
Um, well yes and no. it may be imrepssive but it has nothing to do with Xith or JOGL.

Counting polygons rendered in of itself is pretty-much meaningless measure in todays world as all rendering isdone completely on the graphics card on any reasonably modern system. Even transform and lighting occurs on the card. All thats going to be the same regardless of HOW the data gets to the card.

The real issues are in manipulation of those polygons before they are rendered, the complexity and number of the state changes, how well they are culled and how well they are ordered to minimize those state chnages

Forget about screen fill rate, thats early 90’s nonsense. By the time they get to the screen space its out of the software’s hands. As such more meaningful measures are things like the total number of polygons in the world model they are rendering and the percentage culled.

I understand your point that the engine’s preparation of what it will give to the graphics card is most important (see ID’s smart engines). When the engine culled away all polygons which don’t need to be drawn, there are still so many ways to feed/give the data to OpenGL with all those extensions. If the engine doesn’t use a good way it’s still useless.

It’s the first time I’ve seen 162k polygons at 28 fps and high res screen with Java - out of a real world example where the whole universe has many many more objects and polygons.
Well, since Xith is the rendering engine it all has to do with Xith, isn’t it? Since Xith uses OpenGL it has to do with Jogl, too. We could try to use a Mesa based Xith. :wink:

Xith does impress me because it’s a nice high level 3d engine and it’s fast (according to Yuri and David it will even become faster in the future): David’s real world examples proove it.

However I won’t quote again a “SimpleCubeTest” because that is meaningless indeed.

The single largest state change is a texture switch. This generally involves a hardware pipeline flush (you get that with most state changes) & a texture cache flush (nasty).
Screen fill rate can be an issue, but only if you are covering the whole screen 10 times with alpha blended polys. Most cards won’t like doing that at all.
Triangle throughput is ONLY a factor if you are having to transform in software, and then you will hit a bus-bandwidth limitation of around 1million vertices sent to the card per second. Aggressive stripping (using degenerate tris to connect strips so a single object is one strip) is well advised, preferably using something like the NVidia stripper, which has optimisations based on the size of the TnL vertex cache. However, these vastly inflate the tri throughputs as 25% of the tris are now zero size (& should be rejected by hardware).

For best performance:
a) Strip your models
b) Batch by texture (a decent 3d engine should do this for you)
c) avoid sorted transparencies if you can - they have to be drawn by depth order, and so cannot batch by texture, and the fill rate is halved (on a good day)
d) Cull whole objects to the view frustum in software, but dont bother culling tris in software - the hardware can do this much faster than you.

As a side note for ©, drawing transparent objects (particularly particles) as additive is great, as you dont need to sort by depth. Addition is a commutative operation, alpha-blending is not. This lets you draw your large particle systems all in one go without worrying about the sorting/texture switching.

  • Dom

I guess you meant to say dont bother culling tris in software - unless that object is really big - if not then show me how 2k*2k plane be rendered faster -without culling unnecessary tris- compared to getting rid of unwanted tris procedurally.

Simple answer:
Split the model up into manageable sections.
For hardware TnL, you want batches > 200 tris.
however, 2kx2k = 4million, which is a little large to throw at the hardware. You should break it into something like 40x40 blocks (1600 tris, nice batch size) and cull those to the frustum/distance and send them as single objects.
I presume your meaning a terrain - in which case after splitting into square areas, you can then have lower detail versions (20x20, etc.) and converged versions to deal with level of detail (4 ‘40x40’ blocks become a single ‘20x20’). If you dont deal with this then you have the problem of ‘what if someone stood on a mountain right in the corner of my terrain and tried to look the whole distance’.

You can get away with a small subset of your scene being dynamically generated (ROAM, prgressive meshes, morph targets, etc), but any tris sent this way get hit by the bus-bandwidth. Whole models do not have this problem.
For example:

hardware reading a model:

AGP RAM -> Hardware

V. fast, limit is AGP bus (4x you would hope)

Software dealing with it:

RAM -> CPU -> RAM (AGP if lucky) -> Hardware

Note 2 traversals of the CPU bus - one up, one down. This will start using significant CPU resources that could have been avoided. For a start, you won’t be able to send 2 million vertices from memory to CPU and back within a single 20ms frame.

  • Dom

That is really simple answer I mean simple solution but it wouldnt work really with terrain - unless one wants to go with precomputed tile-system. But in any case I was just saying that your point d) was too generic. In my system I am culling by tris and performence is not bad. But I agree with the rest you said.

Sometimes sorting front to back for opaque geometry is more important than texture sorting. If you have enough texture memory on GPU, fill rate benefit of front to back sorting can be bigger than penalty of texture trashing.

That is true, particularly if the distant objects have multi-texturing. The latest cards also have z-buffer optimisations that can speed the z-test up a lot. Older cards (early geforces, ATI 7500 or earlier) won’t benefit much, as the z-test is concurrent with the texture fetch, and keeping multiple pixel pipelines in sync prevents the early-out. However, if you are in a state with large overdraw like this you would be wise to be looking at techniques such as portals & basic occlusion comparisons to cull objects behind large opaque objects.

The simplest occlusion method is to have several visibility spheres on an object - large ones encompassing the whole object for frustum checking, and a smaller ‘inner’ spheres representing the opaque regions/ If an objects ‘outer’ sphere when rendered lies within a closer objects ‘inner’ sphere then it is occluded, so you can cull. You can use any shape, and there are articles (check Graphics Gems & Gamasutra). the most useful application for this type of system is for rendering city scenes. In this case, buildings want an inner cuboid for occlusion, and cars/pedestrians check their cull-spheres against this. Very fast & saves you even sending the model data to the hardware. Spend a little effort early to save the card a lot of effort later.

It all depends on your situation & what you are rendering, so there is no universal method that is guarenteed for all cases unfortunately, but I would say that texture batching is the most generally useful optimisation for PC cards.

Oh - and if you can, pack multiple small (no-tiling) textures onto larger packed pages, as this saves you a large amount of batching. You can cut down from 200+ individual textures to 20-30 pages and save massively on batching overhead & state changes. (figures taken from memory of a published game I worked on)

  • Dom

[quote]If you have enough texture memory on GPU, fill rate benefit of front to back sorting can be bigger than penalty of texture trashing.
[/quote]
Thats a pretty big ‘if’ though, most games use plenty of textures, but whether they suffer from lots of overdraw that front to back rendering would help with is debatable.

Of course Xith has stencil shadows built in doesn’t it? So before doing your proper texturing and lighting rendering you need to create a perfect zbuffer anyway so you get to sort by material and get the benifit of early z-fail :slight_smile:

While we’re on the topic of optimisation, has anyone thought of adding the use of a HOM to Xith (perhaps adapted from http://www.jpct.net/jpct.htm ) to get accurate occulsion queries without clogging up the graphics card?

[quote]It’s the first time I’ve seen 162k polygons at 28 fps and high res screen with Java - out of a real world example where the whole universe has many many more objects and polygons.
Well, since Xith is the rendering engine it all has to do with Xith, isn’t it? Since Xith uses OpenGL it has to do with Jogl, too.
[/quote]
Fallacious reasoning. if the scene never changes then all the work that is done by Xith OR JOGL is happening before the first frame. Frame counting thus is meaningless.

I’m not suprised this is the first time you’ve seen this kind of rate as Video cards keep getting better. In the purest case it has nothing to do with Java, Xith or JOGL.

In reality ina real app like mgicosm there ARE changes going on, so frame rate does have some meaning, but the polycount is still pretty much irellevent. The number of state changes and amoutn of texture info that has to be moved across the bus are both likely to be significant. The amount of work culled out is also likely to be significant. But the 'total polys in the scene" just isn’t a terribly important measure in of itself and tells you nothing outside of the ability of your graphics card.

[quote]Fallacious reasoning. if the scene never changes then all the work that is done by Xith OR JOGL is happening before the first frame. Frame counting thus is meaningless.
[/quote]
I agree it depends on the movement of the scene. :slight_smile:

[quote]I’m not suprised this is the first time you’ve seen this kind of rate as Video cards keep getting better. In the purest case it has nothing to do with Java, Xith or JOGL.
[/quote]
Well I meant I haven’t seen such thing with a Java program. At Jediknights-III I like high FPS rates.

I’ve noticed that the FPS number of a test scene (200 k polys) halves when I switch the polygon mode from filled mode to line mode.
No matter if I do it manually or with Xith’s nice Renderoption.setOption(Option.ENABLE_WIREFRAME_MODE, true);

Since I don’t plan to go for wireframe in the end, I don’t mind too much. :wink:

However maybe any HW/OpenGL expert would like to explain why wireframe is slower than filled mode, please? When some++ years ago I implemented a polygon raster fill routine on good old Amstrad CPC (8 bit) it’s been the other way round.