Optimisation2: Public test

After much wrangling, bug solving and repeated profiling from the earlier thread I’ve made what I thought would be some major speed ups - Some accurate view & light culling as well as switching everything to use indexed vertex arrays.

Unfortunatly i’ve not actually seen any difference in speed :o After the latest bout I found that its not my raw memory copying, or sheer volume of calculations (not now, at any rate) but just the actual rendering - the calls to glDrawElements are just too damn slow.

On a whim I reduced the viewport size, and lo! the fps shot right up to ~75fps once again. So on my GeForce2 GTS I seem to be fill limited at only seven lights :frowning: I’d like others to give the app a quick whirl so i can see what kind of fps is common - my GeForce used to be pretty good, but by todays standards is probably lagging somewhat on raw fill rate.

App is here: http://studenti.lboro.ac.uk/~cojc5/Vecript/VecriptDist02-09-2003.zip

Readme includes instructions (simple!) to load and view the test scene. I’ve not included Jogl since that would add ~2Mb that people might already have. Tested on the latest nightly build of Jogl without a hitch, so you can set up your jogl jar & dll how ever you like it on your system.

All comments welcome :slight_smile: And if anyones got any suggestions for general UI / usability i’m all ears…

First - I do not agree with statement ‘only 7 lights’. 7 is really a LOT of lights, if you apply all of them to single object. Have you tried to measure fps with just 1 or 2 lights ? Please check if it will make a difference. If yes, then you can probably try to turn off few least influcencing lights for each object.

On the other hand, AFAIK, pixel fill rate has nothing to do with number of lights. Lights are resolved per vertex (so they limit amount of triangles per screen per second), not at pixel.

Hi
I just tried it, and got a nice big stack trace for ya :slight_smile: here, I got what looked like 66fps roughly, Geforce 4 4200 Athlon xp 2100

HTH

Endolf

Just a heads up that unless you’re doing lighting in a shader - any number of lights more that 3 is TOO MANY lights. NVidia has a performance paper which talks about this and it shows clearly that performance drops off rapidly in proportion to the number of lights you have. In most games, there are very few ‘true’ lights.

I took this from the nvidia sumer camp notes of '99

So you’re probably eating a 1/3rd of your performance just in lights.

Thanks to everyone that gave it a try :slight_smile: I guess I should explain about the lights - regular GL lighting isn’t being used (well, one ambient is but that doesn’t count) it’s all done by building the light intensity up in the alpha buffer, before modulating this by the geometry.

Since this requires (per light!):
[]Clear alpha buffer
[
]Load alpha buffer with light intensity
[]Mask off shadow hulls and fins
[
]Modulate with scene geometry

Obviously this means that more lights need vastly more activity going on in the framebuffer :frowning: And i’ve already reduced the geometry rendered per pass by limiting it to the objects within the light bounds.

I think that using the scissor test could gain me some speed by limiting the screen update to the lights bounds - how much fill rate this saves is another matter though.

Note that the test level is a pretty bad case - i’m profiling with the entire level in view, and most of the lights overlap by quite a bit.

Endolf: Thanks for the trace, it seems like the texture files arn’t being found (no big deal really, I broke texturing while optimising). The textures loaded are controlled by a text file in ./Data/Textures/TestSet.txt - I stripped out a lot of the entries to reduce the redundant textures in the zip. You can either check the txt file and fix the entries (should be something like:
Detail Detail.png
on every line. Perhaps I missed the file extensions on the second collumn?

Or you can just delete the entire contents of the file, textures arn’t needed anyway. Did the app continue after these exceptions (should have done!) or did it die a death?

[quote]After much wrangling, bug solving and repeated profiling from the earlier thread I’ve made what I thought would be some major speed ups […]
Unfortunatly i’ve not actually seen any difference in speed :o
[/quote]
That’s why you should never guess, and always always use a profiler before optimising. It saves a lot of time, and keeps the code simpler where it can be.

Well if you look at some of the profiler outputs from the other thread, you’ll see some substancial decreases in the time spent in several key methods. I think i shifted my bottleneak from geometry to fill rate and didn’t notice when it switched :o

Although if you know of a profiler that can give useful information about whats happening down the graphics pipeline i’m more than willing to (ab)use it :slight_smile:

Hi
It carried on running fine, it was just those errors

HTH

Endolf

Well if you’ve got a mac you can just run the Open Graphics Profiler and it will tell you what you want to know. Intel used to make a rendering pipeline profiler some years back but I don’t know if its still supported.

You would think people would be tripping over each other to sell this sort of thing considering the large number of game studios there are in the world.

nVidia apparently have some sort of graphics pipeline profiling tool, but its only avalible to registered developers :frowning:

From research and experimentation I think i’m not actually fill limited, but memory bandwidth limited. Each light requires a lot of blending into the frame buffer (and lots of ‘hidden’ geometry like shadow volumes), so lots of read-modify-write operations. Unfortunatly nVidias suggestion to combat this is to drop from 32bit colour to 16bit colour, which isn’t possible since that would mean i loose the framebuffer alpha which is such a vital part.

However using scissor testing to restrict the viewport had good results, my test level jumped from ~18fps to ~38fps. ;D With 16 medium and small sized lights I can also hit a consistant ~60fps, so maybe careful level design should be sufficient.

Following from the scissor optimisation, the only other change i can think of would be to use a stencil test to further limit the screen updates. However the difference between scissor and stencil test areas is likely to be pretty small and i was planning on using the stencil test for parallax layers.

FYI: I got around 21fps using W2K - GF4 Ti4600 - Dual Athlon MP1800+ (5% processor)

Peter

Here’s a nice presentation that may help from the nVidia site explaining some performance bottlenecks and possible solutions.

http://developer.nvidia.com/docs/io/4000/GDC2003_PipelinePerformance.pdf