I’ve done some preliminary state management, such as enabling the textures backwards, coalescing calls, among other minor things. Good suggestion, I will look into it more thouroughly for sure.
The gamma corrective op that runs on r200 seems kind of bloated, but its not. There was an “option” of packing RGB ramps into a single texture, (saving some tex bind, state calls) but in order to do a dependent texture fetch on radeon 9k, which doesn’t support ARB fragment programs, but rather a semi-limited (instruction count wise, 8 per pass) ATI proprietary shader, I kept running out of instructions in the second pass to do a dependent texture fetch. (first pass is maxed out for YUV & EQ)
So thats why I opted for using 6 textures, each of the textures (top 3) contains the R,G,B ramps while lower textures contain the YUV planes. I’ve also considered packing the YUV into interleaved format, but that would induce a hit on the CPU since data comes from source in a planar format, so things like that I’ve been mucking with. (cpu will be doing quite a bit of DCT, so no go) (also trying to minimize the amount of contenders for the branch prediction table, I hear even on modern CPUs its a limited resource, so with all those other things looping everywhere, cache coherency can be a problem)
I have a branch (Src Control) which does most of this stuff using ARB programs where you have the POW instruction, so gamma correction does not require dependent fetch into separate ramps … and its so much nicer, I must say. (as well as other optimizations which I couldn’t pull on such ancient HW, but its a requirement)
I figured if r200 has 6 tex units, why not use them. So with those 6 units, I must enable each, setup pixel alignment & unpack modes, …then setup VBO, send some geomertry and restore states.
All in all, I want to minimize amount of time that gets spent in GL overall, but its inevitable (feature creep made me add features I wouldn’t of otherwise opted for myself) so I’m spinning wheels looking for ways. But, I’ll be doing some thorough profiling to iron out the remaining hotspots, currently few preliminary profiler passes have been done.
There was a good quote about time or lack thereof, can’t remember now. So much to do, so little time.