Multiple shaders or if statements?

Scourge · September 26, 2016, 6:31pm

Hello,

I currently thinking whether to reducing the amount of shaders or not. As example: I have one for single textured objects and one for objects having blend maps (to represent simple dirt/blood on objects or for terrain). Sticking to this example, I could reduce these two shaders to one by having another boolean uniform and an if statement in the fragment shader or by simply using the blend map shader and load a 1x1 px blend map for single textured objects (do I need 1x1 px images for the other images as well or doesn’t opengl care if theres no target for the sampler2d?).

I would like to know from people with more 3d experience if they would merge such similar shaders and if yes which approach should be preferred due to which reasons and if there are other ways.

theagentd · September 26, 2016, 6:57pm

It all depends on what you’re doing .

First of all, the choice between many shaders and a few uber-shaders depends on where your bottleneck is. Each shader bind has a fairly big CPU cost, so if you have 1000 shader switches per frame this could easily be your bottleneck. In this case, switching to an uber-shader will improve performance as it reduces the CPU load by a lot for some GPU overhead instead. If your GPU is already the bottleneck, then increasing the GPU cost to eliminate a couple of shader switches will just pile on more stuff on the GPU, reducing performance.

Some shader tips:

An if-statement is not inherently slow in shaders. It all depends on divergence and the size of the if/else blocks. If all shader invocations in a group (= all vertex shader invocations in a certain batch or all fragment shader invocations in a certain pixel area) take the same path in the if-statement (either all true or all false), then the if-statement will be cheap and only one of the two paths will be executed. If the shader invocations diverge, both sides will have to be executed for all invocations as the group runs in lockstep. In the end, this just doesn’t matter if both the if and/or else blocks don’t contain a lot of code.
Simple conditional assignments, like [icode]x = condition ? a : b;[/icode] generally compile to conditional assignment instructions that don’t require any branching at all. You can use this to your advantage.

Scourge · September 27, 2016, 6:06pm

Thank you for your reply.

I don’t have any performance problems at the moment but I’m thinking about the problem with further development. As example if I implement bump mapping and material maps (specular maps, gloss maps, etc), it will multiply the amount of shaders for each combination. Thinking even further, when I get at the stage where I want to implement my idea of skeletal animations, it would lead to even more shaders. Being at that stage, it would be a lot of work to add more light types or making other improvements because I would have to change tons of shaders.

A preprocessor for handling includes or even dynamic shader code generation would be a good solution - but at the moment I would rather gather more experience with shaders first before implementing something like that. Therefore I’m asking what would be better and if it might be performant enough I might even keep it that way.

I appreciate your input regarding if statements, that means I wouldn’t have a lot of performance problems thanks to sorting my objects to render (which also got frustum culled).

I would like to have some inputs on my other idea by just using 1px fake graphics, if that’s not asking too much.

theagentd · September 28, 2016, 2:28am

I’m using deferred shading, so for me the bottleneck is almost always the write to the massive G-buffer (4 render targets!). Due to this, I can get away with always doing normal mapping and reading all 4 optional texture maps I support. The texture units and ALU cores are simply idle otherwise waiting for the ROP writes, so doing some extra reads from empty 4x4 textures won’t affect performance at all. Doing small too small draw calls in general is also very bad for GPU performance, small being something like under ~100 triangles or so.

Normal mapping doesn’t actually have that much of an overhead, so you should probably be able to get away with just using an pass-through 4x4 texture and always using it.