Performance & Benchmarks

Preston · November 7, 2003, 5:03pm

Although I’m learning slowly how to use Xith3d (because I am a 3d beginner and it’s the first 3d Engine I’ve the pleasure to use) it’s pretty thrilling and looks very nice to me.

I’ve read the interesting notes about performance and the “road map” and this sounds very good, like I said.

On a PC with AMD XP 2500+ and GF4 TI4200 I’ve run the java com.xith3d.test.CubeTest (cool idea, btw, to include such basic tests directly into the main Xith3d library jar.)
It says (it’s a 640x480 window I think) :
done frame speed test at 80 fps
there are 3492 triangles in scene
rendering 279360 triangles/sec
Num frames = 2000
Delta ms = 25

So currently I get about ~280k polygons per second on a high end machine (painted via ~300 cubes with 6 sides with 2 triangles each). Addendum: this isn’t comparable to real world examples I learned.

I’d like to program a 3d action game. My friend, a graphic artist, plans to use up to 50k polygons on screen. With the current Xith3d Cube test as a very (!) rough indication this would mean I would see about 5.6 frames per second (fps) ?
(Provided my game used a similar scenegraph like the Cubetest, and of course it’s still a very rough comparison because a game needs more than a Cubetest. Also “polygon” isn’t a good measure; however let’s say my intended 50k polygons are comparable in size and texturing to the Xith-Cubetest.)

I learned that Xith3d will be speed optimized by the time (and even if you would use the “direct” OpenGL way via Jogl it would be roughly the same or even better with Xith3d for complex scenes).

My friend intends to go for about ~50 fps for a 50k polygon scene on the above mentioned PC. Is his expectation a realistic goal which I could fulfil with Java (Xith3d) - or do I have a problem? I don’t mean a problem with Xith3d or Jogl. Rather: is it possible at all to render 50k polygons with 50++ fps with an OpenGL based Java 3d engine, or is this data too much?

DavidYazel · November 7, 2003, 6:14pm

Magicosm runs often at 50 fps at 50k polygons. There are other factors of course than just shear triangles. Sometimes I have a 35k scene rendering at 42 fps on a Geforce 4600 TI

Jeff · November 7, 2003, 7:10pm

sigh And that was your first mistake.

Rule #1 of Microbenchmarks, know what you are measuring and how it does and doesn’t compare to the real world of applciations.

In the Cube demo (or any pure poly-pushing test like this) you are paying the overhead of the scene-graph with NONE of the benefits.

(1) All your world-polys are onscreen at once in the cube, the same I would assume will NOT be true of your game. Xith (or Java3D) will provide optimized view-frustrum culling so you AREN’T wasting main CPU or card fill-rate on those non-visible polys in a real app.

(2) Xith (and Java3D) do state sorting. This is because THE most expensive real-world operation on your video card is switching the draw state. This is unused in your trivial cube but is VITAL to good performance in the real world.

Microbenchmarks measure a systems ability to run microbenchmarks. They do NOT generally reflect how real apps operate and thus how they perform.

Orangy_Tang · November 7, 2003, 11:01pm

Polys per second is practically an outdated term, in the same way that cpu cycles per second is fast becomming irrelevant.

You could push the same number of polys through the Unreal engine with static lightmapping or the Doom3 engine with dynamic shadow volumes and DOT3 lighting, the performance of the Doom3 engine would almost certainly suffer, but you could achieve better effects with less polys.

Perhaps more to the point: with a good graphics card the actual thoughput will be dependant on the card, not the language. API overhead in calling draw commands should be absolutly minimal (unless you’re using something silly like immediate mode). The choice of Jogl vs. Xith is more tricky - generality comes at the expense of (runtime) speed, but is this a suitable tradeoff for getting fast development and a clean design you don’t have to hack around when requirements change?

Btw, you comment about Alien Flux further highlights the irrelevance of polys-per-second. I’d estimate that it pushes at max 1k polys a frame, and probably under 100 if you removed all the particle effects.

Preston · November 8, 2003, 3:07am

[quote]Magicosm runs often at 50 fps at 50k polygons. There are other factors of course than just shear triangles. Sometimes I have a 35k scene rendering at 42 fps on a Geforce 4600 TI
[/quote]
Thanks for this real world examples, David. That’s what I’ve been looking for.

What processor does your mentioned GF4 machine have got, please?

Preston · November 8, 2003, 3:23am

We 3d newbies give you 3d experts a hard “job”. Thanks to all of you for replying so many times in the forum and talking about Java and 3d details.

In this thread I’ve been “just” interested in some real world figures on how many polys I could allow may friend (the mentioned artist) to use for our small game. Since I didn’t have figures I took a simple CubeTest. This doesn’t say much, I see.

quote All your world-polys are onscreen at once in the cube, the same I would assume will NOT be true of your game. Xith (or Java3D) will provide optimized view-frustrum culling so you AREN’T wasting main CPU or card fill-rate on those non-visible polys in a real app.
[/quote]
In my intended game you’ll usually see the whole scene with up to 50k polygons at once on screen. Of course frustum culling will make it faster when the player zooms in to certain areas, but this will be special cases. Only for this game; I know many (or most) other games will profit very much from frustum culling.
Btw “my” artist told me that at work place he’s been creating models and scenes with up to 100k polygons on screen at once for their current game project. Their (commercial) engine renders this with 60+ fps (on a comparable GF4-PC) he said. Please don’t take this as comparision for anything.

quote Xith (and Java3D) do state sorting. This is because THE most expensive real-world operation on your video card is switching the draw state. This is unused in your trivial cube but is VITAL to good performance in the real world.
[/quote]
I see. I’ve to read more about draw states.

[quote]Microbenchmarks measure a systems ability to run microbenchmarks. They do NOT generally reflect how real apps operate and thus how they perform.
[/quote]
Yes. Again I’ve to add that I’ve not been interested in microbenchmarks but “just any” figure available. Since I didn’t find any figures I’ve incorrectly used the CubeTest (because it was there, I think).

Preston · November 8, 2003, 3:28am

[quote]The choice of Jogl vs. Xith is more tricky - generality comes at the expense of (runtime) speed, but is this a suitable tradeoff for getting fast development and a clean design you don’t have to hack around when requirements change?
[/quote]
Absolutely. I’m already convinced that it’s a good idea to use a nice higher level engine like Xith3d (compared to the low level OpenGL way for example.)

[quote]Btw, you comment about Alien Flux further highlights the irrelevance of polys-per-second. I’d estimate that it pushes at max 1k polys a frame, and probably under 100 if you removed all the particle effects.
[/quote]
I see. Let’s forget about that comment then. I’ll edit it.

DavidYazel · November 8, 2003, 4:27am

Just wanted to weigh in regarding performance of Xith3d. I tweaked a few things to increase polygon density in a scene and took a snapshot. This scene is rendering 162K polygons a frame at 28 FPS at 1200x1000. Even this is not technically “real world” because “heavy polygon” models will render extremely fast. In other words even your friends commercial engine would be hard pressed to render 162 objects of 1k each, especially if they had different textures. In this scene I upped the tesselation level of objects (which of course is foolish to do normally) so that the ground and avatar and shield is rendering with a lot more polygons than is necessary.

Keep in mind that Xith3D has not been as optimized for performance as it will be in the future.

One other thing to keep in mind is that if you are starting a project now, it will probably be at least a year until you actually statr caring about performance. In other words for a new game you have a tremendous amount of work to do which is unrelated to the rendering engine itself. It is doubtful the rendering engine would be your bottleneck for many months to come. By that time Xith3D will be mature and ready to go head to head with those commercial engines out there.

Darn look like our website is down! well you can check the screenshot when it is back up.

Preston · November 8, 2003, 5:53am

[quote]Just wanted to weigh in regarding performance of Xith3d. I tweaked a few things to increase polygon density in a scene and took a snapshot. This scene is rendering 162K polygons a frame at 28 FPS at 1200x1000. Even this is not technically “real world” because “heavy polygon” models will render extremely fast. In other words even your friends commercial engine would be hard pressed to render 162 objects of 1k each, especially if they had different textures.
[/quote]
Thanks for your weigh in. Reads very will and it gives me good confidence for my small project.
Btw my friend intends to create about 100 objects with a total of 50k polygons. Around 50% of these objects will have got alpha transparency too (so I’ll use Xith’s internal sorting for such transparency objects a lot).

[quote]Keep in mind that Xith3D has not been as optimized for performance as it will be in the future.
[/quote]
Yes I learned this already. Yuri (and others) explained that niceley some days ago.

[quote]One other thing to keep in mind is that if you are starting a project now, it will probably be at least a year until you actually statr caring about performance. In other words for a new game you have a tremendous amount of work to do which is unrelated to the rendering engine itself. It is doubtful the rendering engine would be your bottleneck for many months to come.
[/quote]
Absolutely.
My game will start as a hobby project with an aimed shedule of half a year. Well, we’ll see. “It’s done when it’s finished” comes to my mind.

[quote]By that time Xith3D will be mature and ready to go head to head with those commercial engines out there.
[/quote]
Sounds pretty good. It’s like David versus Goliath. And we all know David won.

PS: The snapshot I’ll have a look at when the server’s ready.

DavidYazel · November 8, 2003, 4:30pm

Ok the screenshot is up.

kevglass · November 8, 2003, 4:33pm

flipin’ eck guvnor, that looks a bit good!

Kev

Java_Cool_Dude · November 8, 2003, 5:04pm

/me always wanted to work on Magicosm …
Sigh

Jeff · November 8, 2003, 7:36pm

Hey Dave!

Looks awesome.

Up on Mac yet?

divzero · November 8, 2003, 8:38pm

that is most impressive. I the GUI fits in so well with the game, very slick.

Will.

Preston · November 9, 2003, 5:07am

162k polygons on screen at once at 28 fps (1152 x 864 Pixel) with Xith3d and Jogl in a real world example…
Well, that is impressive.

Jeff · November 10, 2003, 3:56am

[quote]162k polygons on screen at once at 28 fps (1152 x 864 Pixel) with Xith3d and Jogl in a real world example…
Well, that is impressive.
[/quote]
Um, well yes and no. it may be imrepssive but it has nothing to do with Xith or JOGL.

Counting polygons rendered in of itself is pretty-much meaningless measure in todays world as all rendering isdone completely on the graphics card on any reasonably modern system. Even transform and lighting occurs on the card. All thats going to be the same regardless of HOW the data gets to the card.

The real issues are in manipulation of those polygons before they are rendered, the complexity and number of the state changes, how well they are culled and how well they are ordered to minimize those state chnages

Forget about screen fill rate, thats early 90’s nonsense. By the time they get to the screen space its out of the software’s hands. As such more meaningful measures are things like the total number of polygons in the world model they are rendering and the percentage culled.

Preston · November 10, 2003, 5:42am

Um, well yes and no. it may be imrepssive but it has nothing to do with Xith or JOGL.

Counting polygons rendered in of itself is pretty-much meaningless measure in todays world as all rendering isdone completely on the graphics card on any reasonably modern system. Even transform and lighting occurs on the card. All thats going to be the same regardless of HOW the data gets to the card.

The real issues are in manipulation of those polygons before they are rendered, the complexity and number of the state changes, how well they are culled and how well they are ordered to minimize those state chnages

Forget about screen fill rate, thats early 90’s nonsense. By the time they get to the screen space its out of the software’s hands. As such more meaningful measures are things like the total number of polygons in the world model they are rendering and the percentage culled.

I understand your point that the engine’s preparation of what it will give to the graphics card is most important (see ID’s smart engines). When the engine culled away all polygons which don’t need to be drawn, there are still so many ways to feed/give the data to OpenGL with all those extensions. If the engine doesn’t use a good way it’s still useless.

It’s the first time I’ve seen 162k polygons at 28 fps and high res screen with Java - out of a real world example where the whole universe has many many more objects and polygons.
Well, since Xith is the rendering engine it all has to do with Xith, isn’t it? Since Xith uses OpenGL it has to do with Jogl, too. We could try to use a Mesa based Xith.

Xith does impress me because it’s a nice high level 3d engine and it’s fast (according to Yuri and David it will even become faster in the future): David’s real world examples proove it.

However I won’t quote again a “SimpleCubeTest” because that is meaningless indeed.

crystalsquid · November 10, 2003, 1:56pm

The single largest state change is a texture switch. This generally involves a hardware pipeline flush (you get that with most state changes) & a texture cache flush (nasty).
Screen fill rate can be an issue, but only if you are covering the whole screen 10 times with alpha blended polys. Most cards won’t like doing that at all.
Triangle throughput is ONLY a factor if you are having to transform in software, and then you will hit a bus-bandwidth limitation of around 1million vertices sent to the card per second. Aggressive stripping (using degenerate tris to connect strips so a single object is one strip) is well advised, preferably using something like the NVidia stripper, which has optimisations based on the size of the TnL vertex cache. However, these vastly inflate the tri throughputs as 25% of the tris are now zero size (& should be rejected by hardware).

For best performance:
a) Strip your models
b) Batch by texture (a decent 3d engine should do this for you)
c) avoid sorted transparencies if you can - they have to be drawn by depth order, and so cannot batch by texture, and the fill rate is halved (on a good day)
d) Cull whole objects to the view frustum in software, but dont bother culling tris in software - the hardware can do this much faster than you.

As a side note for ©, drawing transparent objects (particularly particles) as additive is great, as you dont need to sort by depth. Addition is a commutative operation, alpha-blending is not. This lets you draw your large particle systems all in one go without worrying about the sorting/texture switching.

Dom

Daath · November 10, 2003, 2:17pm

I guess you meant to say dont bother culling tris in software - unless that object is really big - if not then show me how 2k*2k plane be rendered faster -without culling unnecessary tris- compared to getting rid of unwanted tris procedurally.

crystalsquid · November 10, 2003, 2:18pm

Simple answer:
Split the model up into manageable sections.
For hardware TnL, you want batches > 200 tris.
however, 2kx2k = 4million, which is a little large to throw at the hardware. You should break it into something like 40x40 blocks (1600 tris, nice batch size) and cull those to the frustum/distance and send them as single objects.
I presume your meaning a terrain - in which case after splitting into square areas, you can then have lower detail versions (20x20, etc.) and converged versions to deal with level of detail (4 ‘40x40’ blocks become a single ‘20x20’). If you dont deal with this then you have the problem of ‘what if someone stood on a mountain right in the corner of my terrain and tried to look the whole distance’.

You can get away with a small subset of your scene being dynamically generated (ROAM, prgressive meshes, morph targets, etc), but any tris sent this way get hit by the bus-bandwidth. Whole models do not have this problem.
For example:

hardware reading a model:

AGP RAM -> Hardware

V. fast, limit is AGP bus (4x you would hope)

Software dealing with it:

RAM -> CPU -> RAM (AGP if lucky) -> Hardware

Note 2 traversals of the CPU bus - one up, one down. This will start using significant CPU resources that could have been avoided. For a start, you won’t be able to send 2 million vertices from memory to CPU and back within a single 20ms frame.

Dom