In our app, we are wanting to implement some post-render effects that use both pixel data and z-depth. We generally run at high res, with screen dims ~ 1500 x 1000 (ie big and not a power of 2). In order to do this, I first implemented a Frame Buffer Object, rendered our scenes into that, and copied the result onto a visible quad. This resulted in a MASSIVE slowdown on my 256MB ATI card. On a 512MB nVidia card, it ran fine, however.
Conclusion: either I’m running out of card memory (though a quick calculation of how much space might be required for one of our scenes comes to about 50MB); OR something within ATI is broken. I want to test on a 256MB nVidia card but we don’t have one here
So I tried a few experiments. First, I changed it so that I rendered to the screen again (frame rate around 40-50) but then copied the pixels and z into two auxiliary textures using glCopyTexture2D(). That was quicker - but only by a shade over the FBO case. Then I shrunk the size of the window (to less than 1024 x 768 and then 640 x 480) ; if it was running out of vRAM, you might expect to see a sudden speed-up, but instead there was just a very gradual increase. If I take the copying code out, it runs at normal speed again.
So I’m baffled now. Has anyone got any bright ideas as to what might be going on, or how to work around this?
Update: it now seems that it’s the z-buffer copy that’s really dragging, the pixel copy seems fine. I’ve tried fixing the buffer to be 512 x 512 and there’s no change, so it’s nothing to do with not being a power-of-2 size. Bizarrely, our old shadow mapping code used this exact same technique with no drop in frame rate. So maybe it’s something to do with state? :-\