Sure, that obviously works, but I just want to advertise a second pass a little more. ;D
You said you have a number of 48x48 sprites. These have a pixel area of 4848 = 2304 pixels. A 1920x1080 has 19201080 = 2073600 pixels. In theory, your method would be faster until the total area of your sprites is lower than the area of the screen (no overdraw), so you can draw 2073600/2304 = exactly 900 sprites. When you reach that number, no matter where your sprites are, you’re sampling from your lightmap more than you would with a second pass. There’s just one thing we haven’t considered yet: your terrain rendering. I think it’s safe to assume that your terrain (or background or whatever) will cover the entire screen, meaning you’re already sampling the whole visible part of the lightmap once, and all sampling for your sprites is just sampling the same place again.
Of course doing a second full screen pass is more costly as it requires blending. Performance will also differ between different cards. I believe blending is done by the ROP units of your graphics card, and texture sampling is done by the texture sampling units. How many of these units you’re card have and how powerful they are determines where a second pass becomes more effective relative to how many sprites you have. Funnily enough, blended rendering isn’t slower at all with my card. I believe the ROP units are balanced for MRT rendering / deferred rendering, which requires about 8x more frame buffer performance than normal rendering. Blending just uses one read and one write, compared to the 8 writes that deferred rendering requires. As this is a completely separate part of the graphics card that is left underused in your game anyway, it is basically free compared to normal rendering. It does however need a second pass though, so it still isn’t free, just a lot less expensive than you think it is.
With simply sampling the lightmap for each sprite pixel you’ll increase the rendering cost of your already texture limited fragment shader you have by a certain amount. By separating the lighting and sprite rendering, you reduce the cost of rendering each sprite while getting a constant cost second pass.
What you do: cost = numSprites * x where x is approximately 1.1-1.5 meaning a higher cost per sprite
Second pass: cost = numSprites + secondPassTime
A second pass would be (slightly) slower for a low number of sprites, but scale better with more sprites, meaning it has a lower max FPS, but a higher min FPS. The trade-off is similar to the trade-off made in deferred shading. The idea there is to separate the cost for rendering objects and the cost for rendering each light.
Direct rendering: cost = triangles * lights
Deferred rendering: cost = triangles + lights
Of course rendering each triangle is much more expensive with deferred rendering though due to the very high bandwidth required. However, this is easily made up for if you have 100s or even thousands of lights, as you don’t have to render the same triangle more than once. I hope you can see the similarities. :
In the end, its all just performance. Premature optimization is the root of all evil. You seem to have a working solution right now, so don’t change anything unless you actually find this to be a bottleneck later if you realize you want thousands of sprites.
I’m out. xD