How fast is fast?

I’ve been working on a jogl engine that, hopefully, has the capability of being quite extensible. However, I’m lazy so I’m just doing enough extra implementation to get stuff for some other projects working. I did some benchmarks and was wondering if anyone with any experience knew whether or not they were actually good:

My computer: intel duo core2 2.2 ghz macbook pro, 2 gb ram, geforce 8600 mt

My engine rendering 10000 identical cubes (based off of the same vbo’s): 20 fps
xith3d rendering 10000 cubes with vbo optimization flag set (all using the same GeometryArray): 6 fps

I’m not a master at all with xith3d, so I could just be screwing it up.

If 20 fps isn’t very good, I may just switch to xith because of it’s huge feature set, but hopefully the world’s opinion will show otherwise :slight_smile:

Oh, my tests were without any frustrum culling or occlusion, I haven’t gotten around to it yet. xith3d was with 0 lights, mine was with 3.

Since your benchmarks didn’t help answer your initial question they were no good :P.

OK, now seriously, if the benchmark you wrote was representative of the workload you are attempting to do then 20fps may or may not be enough, similarly 6fps may be more than adequate. For example scientific visualization may be more than fast enough at 6fps, a real time strategy game might be good with 20fps, and a first person shooter will need to aim for 60fps or more. And all of that is assuming that your benchmark was even representative of what you are going to be doing.

When I meant whether or not they were good, I was wondering how those rates have compared to other people’s experiences or programs. The test was just meant to be a load test, not necessarily representative of my final goal. However, since you asked, I do plan on making a game.

Also, I continued to run some variations of the tests and there may be some weird bugs that are slowing it down abnormally. Until I’m able to pin point it, the above results are probably inaccurate, so nevermind everyone.

Okay, I fixed those issues (hopefully) and did some further tests. The bugs were only occurring with low model counts but the models would have high poly counts, and didn’t effect the previous benchmarks where there was a high object count where each had low number of faces.

Anyway, new benchmarks:
macbook pro 2.2 ghz geforce 8600 mt:
10,000 cubes at 21 fps
500,000 triangles total, 10 objects at 50 fps
2.5 million triangles total, 10 objects at 32 fps

amd 4200, ati x1300:
10,000 cubes at 35 fps
500,000 triangles total, 10 objects at 75 fps
2.5 million triangles total, 10 objects at 35-40 fps

I am much happier with these results, so now I will repeat my question to the world:
How do these frame rates (taking into account the machines that they’re on) compare to other people’s experiences?

Am slightly puzzled that the ATI x1300 should be faster than an NVidia 8600. Also I take it the CPU is an Intel Core 2 Duo @ 2.2GHz, this should be about on par with (or even better than) the AMD 4200X2.

I suppose the results on the MacBook were obtained from running on MacOSX? Do you have boot camp i.e. could you re-run the benchmark on Windows XP? Just curious…

500,000*75 = 37.5Million polygons per second

There are a substantial amount of optimisations you can do. I have a 7900GT Gfx and I achieved 99Million tris/second. I know MatthiasM achieved 101M tris/sec on the same card…

Me and him are slightly crazy about optimisations tho, so 37.5M tris/sec might be enough for what your are doing…

DP :slight_smile:

What is important is the performance in a real game scenario. It’s why the Nintendo Game Cube spoke of around 15M fully lit and textured in game polygons per second with effects, and the PS2 claimed 75M polygons per second (flat shaded).

Im talking about fully textured, per pixel lighting…not flat shaded, non textured polygons :slight_smile:

The game cube was a nice machine, too bad it didn’t get too much support. Hopefully the wii is changing that tho…

DP

I would very much like to optimize it more since faster graphics means I can just perform more in other categories (ie physics, ai, bedroom, etc) if I don’t max out the graphics load. However I was having trouble finding optimization information online for opengl, so what kind of tips could you possibly give me?

First of all you have to keep in mind that the mobile 8600 is not the same as the desktop 8600, it is definitely in the same price and performance class as the 1300. Now with that in mind, I’m just hypothesizing but it’s entirely possible that the 8600 is triangle setup limited. The roles might reverse if more complex vertex shading were used where the 8600 could make use of its unified shaders. Or it’s entirely possible there is some other factor holding it back, with the Mac drivers being the primary suspect.

Keep state changes to a minimum, and use cache friendly geometry techniques like VBOs. Other than that there’s not a whole lot you can do.

OMG, the bedroom ;D It’s quoted, you can’t edit it now :wink:

Is that in normal game conditions, i.e. not arranged to give higher performance output since graphics cards can draw certain scenes much better than others; and is that also with anti aliasing? Well either way that is good!!!

It was 3 lights, no textures, full scene antialiasing (with multisampling, not through the accum buffer). I made a mistake when I wrote the benchmarks for the desktop, I was trying to remember them off of the top of my head from an earlier test, all are correct except for the 500,000 * 10 at 75 fps, it really was 95 fps.

I was using vbos and triangles, also state changes were at a very low minimum. I wrote my code to keep track of the current buffer pointers and if a duplicate is resubmitted, then ignore it, thus meaning I only did a single bind buffer operation for each target (vertices, indices, normals, etc) each frame, and could just keep reusing the pointers because for the scene I did, all the objects were based off of the same vbos (yes it’s not quite a real game situation). It was using the fixed-function pipeline.

I still intend to do frustrum culling and octree/bsp stuff, so far it’s pretty much a straight run through a list of the meshes.

Once again, this is pretty hard to answer, because without a standard benchmark across all our machines the results are not comparable. Also, our experiences have us working with different programs that do things quite differently. eg: you’re using Xith3D drawing lots of triangles, somebody else might be using Aviatrix3D, or in my case I wrote my own multi-threaded game engine but work mainly with small textures.

All I can suggest is that you get the framerate you’re hoping for, on the target machine you’re developing for :slight_smile:

Re: the 8600 thing - somebody pointed-out that it might have been a mobile 8600. If anything else, I can say that the 8600 OpenGL drivers are still not as good as they can be.

One PC developer (for 3d titles) said that there are countless optimisations for various different video cards. Some cards do some things better in a certain way, just like some processors do some things better. Premature optimisation is the root of all evil, so though you don’t want to start off at a tenth of the speed it should you should not begin by obsessing over an extra 10% until you find a bottleneck where 10% breaks it.

Good point, I think then I’ll just implement the most basic and important of all optimizations (don’t do what you don’t have to) and then start working on the rest of my game/engine.

However I would like to know how darkprophet was able to get his triangle/sec rates.

Thanks for all the replies

He used a substantially faster GPU. And even at that was still well below the theoretical limits, the documentation I’m looking at says 900M vertices/sec for the 7900GT.

I ran another benchmark on a computer with an x700 and it faired very similarly to the x1300, albeit a little worse. I checked online and the x700 is rated at 635 million triangles/sec, and I’m getting about 60 million+ (in my best tests), so I wonder how they determine that rating.