Optimizing lighting opengl

obsidian_golem · May 20, 2012, 9:20pm

I am trying to figure out how to optimize lighting in opengl 2. I am using 4 lights per object, with the lights sorted by distance. It works well with just one light, but with any more than that It starts to slow down. I would like a way to speed up my lighting while getting similar visual results. Here is my lighting code:

#version 120
uniform vec4 ulight1;
uniform vec4 ulight2;
uniform vec4 ulight3;
uniform vec4 ulight4;

varying vec4 v_color;
varying vec3 v_position;
varying vec3 v_normal;


vec3 ambient = vec3(0.2,0.2,0.2);
vec3 lightcolor = vec3(0.6,0.6,0.6);

vec3 phong() {
	vec3 nn = normalize(v_normal);
	vec3 dtl1 = ulight1.xyz-(v_position);
	vec3 dtl2 = ulight2.xyz-(v_position);
	vec3 dtl3 = ulight3.xyz-(v_position);
	vec3 dtl4 = ulight4.xyz-(v_position);
	float dis1 = length(dtl1);
	float dis2 = length(dtl2);
	float dis3 = length(dtl3);
	float dis4 = length(dtl4);
	float att1 = (1.0 / (1.0 + (dis1 * dis1)));
	float att2 = (1.0 / (1.0 + (dis2 * dis2)));
	float att3 = (1.0 / (1.0 + (dis3 * dis3)));
	float att4 = (1.0 / (1.0 + (dis4 * dis4)));
	float d1 = max(dot(nn,normalize(dtl1)),0.0);
	float d2 = max(dot(nn,normalize(dtl2)),0.0);
	float d3 = max(dot(nn,normalize(dtl3)),0.0);
	float d4 = max(dot(nn,normalize(dtl4)),0.0);
	vec3 lw1 = att1*(ambient+lightcolor*d1)*ulight1.w;
	vec3 lw2 = att2*(ambient+lightcolor*d2)*ulight2.w;
	vec3 lw3 = att3*(ambient+lightcolor*d3)*ulight3.w;
	vec3 lw4 = att4*(ambient+lightcolor*d4)*ulight4.w;
	return lw1+lw2+lw3+lw4;
}

This function is called once per pixel in the fragment shader. The width of each light tells it whether or not that light is enabled.

Orangy_Tang · May 20, 2012, 9:54pm

The obvious optimisation is that for each dtXX vector you call both length() and normalize() on it, which means you’re calculating the length twice. You can save four square roots by just dividing by the length you already have.

Also, is that first normalize necessary? Shouldn’t your vertex normals already be unit length

You may want to pull out the common ‘ambient+lightcolor’ into a temporary, but I’d hope the compiler is already doing that optimisation.

Edit: Double-also, you calculate the length(), then do disX*disX to get the length squared. You could just manually calculate that, then sqrt to get the length to save you some computation. That may or may not be an improvement depending on how smart length() is handled under the hood.

obsidian_golem · May 20, 2012, 10:04pm

Danny02:

#version 120
uniform vec4 ulight[4];

varying vec4 v_color;
varying vec3 v_position;
varying vec3 v_normal;


vec3 ambient = vec3(0.2,0.2,0.2);
vec3 lightcolor = vec3(0.6,0.6,0.6);

vec3 phong() {
	vec3 nn = normalize(v_normal);
        vec3 dtl[4];
        float dis[4];

for(
	vec3 dtl1 = ulight1.xyz-(v_position);
	vec3 dtl2 = ulight2.xyz-(v_position);
	vec3 dtl3 = ulight3.xyz-(v_position);
	vec3 dtl4 = ulight4.xyz-(v_position);
	float dis1 = length(dtl1);
	float dis2 = length(dtl2);
	float dis3 = length(dtl3);
	float dis4 = length(dtl4);
	float att1 = (1.0 / (1.0 + (dis1 * dis1)));
	float att2 = (1.0 / (1.0 + (dis2 * dis2)));
	float att3 = (1.0 / (1.0 + (dis3 * dis3)));
	float att4 = (1.0 / (1.0 + (dis4 * dis4)));
	float d1 = max(dot(nn,normalize(dtl1)),0.0);
	float d2 = max(dot(nn,normalize(dtl2)),0.0);
	float d3 = max(dot(nn,normalize(dtl3)),0.0);
	float d4 = max(dot(nn,normalize(dtl4)),0.0);
	vec3 lw1 = att1*(ambient+lightcolor*d1)*ulight1.w;
	vec3 lw2 = att2*(ambient+lightcolor*d2)*ulight2.w;
	vec3 lw3 = att3*(ambient+lightcolor*d3)*ulight3.w;
	vec3 lw4 = att4*(ambient+lightcolor*d4)*ulight4.w;
	return lw1+lw2+lw3+lw4;
}

Are for loops faster than static code? I thought that static code was fastest in shaders.

Danny02 · May 20, 2012, 10:06pm

sry i was not finished^^ will post soon, pressed the published button without wanting too^^

Orangy_Tang · May 20, 2012, 10:10pm

Double-double also: you’re going to be completely vector op bound (obviously) while the texture unit sits idle. It may be worth moving some of your computation into a texture lookup. Possibly the light attenuation calculation?

obsidian_golem · May 20, 2012, 10:12pm

How would I do this? I am not using textures in my program.

Danny02 · May 20, 2012, 10:15pm

So:

For loops get automatically unrolled by the compiler, if constant loop count and if this constant is not to big
many loops instead of one big prevents read after write delays.
only real speed up I did, is getting rid of the sqrt in the length function(there is a distance(p0, p1) function in glsl when you need the distance between two points in a shader next time

#version 120
const int LIGHT_COUNT = 4;
uniform vec4 ulight[LIGHT_COUNT];

varying vec4 v_color;
varying vec3 v_position;
varying vec3 v_normal;


vec3 ambient = vec3(0.2,0.2,0.2);
vec3 lightcolor = vec3(0.6,0.6,0.6);

vec3 lambert() {
	vec3 nn = normalize(v_normal);
        vec3 dtl[LIGHT_COUNT];
        float disSquare[LIGHT_COUNT];
        float d[LIGHT_COUNT];
        vec3 lw[LIGHT_COUNT];
        
        for(int i=0; i<LIGHT_COUNT; ++i)
	      dtl[i] = ulight[i].xyz-(v_position);

        for(int i=0; i<LIGHT_COUNT; ++i)
	      disSquare[i] = dtl[i].x*dtl[i].x + dtl[i].y*dtl[i].y + dtl[i].z*dtl[i].z;

        for(int i=0; i<LIGHT_COUNT; ++i)
	      d[i] = max(dot(nn,normalize(dtl[i])),0.0);

        for(int i=0; i<LIGHT_COUNT; ++i)
	      lw[i] =(lightcolor*d[i] + ambient)*ulight[i].w / (1.0 + disSquare[i]);

        vec3 result = lw[0];
        for(int i=1; i<LIGHT_COUNT; ++i)
              result += lw[i];

	return result;
}

obsidian_golem · May 20, 2012, 10:17pm

Danny02:

#version 120
const int LIGHT_COUNT = 4;
uniform vec4 ulight[LIGHT_COUNT];

varying vec4 v_color;
varying vec3 v_position;
varying vec3 v_normal;


vec3 ambient = vec3(0.2,0.2,0.2);
vec3 lightcolor = vec3(0.6,0.6,0.6);

vec3 lambert() {
	vec3 nn = normalize(v_normal);
        vec3 dtl[LIGHT_COUNT];
        float disSquare[LIGHT_COUNT];
        float d[LIGHT_COUNT];
        vec3 lw[LIGHT_COUNT];
        
        for(int i=0; i<LIGHT_COUNT; ++i)
	      dtl[i] = ulight[i].xyz-(v_position);

        for(int i=0; i<LIGHT_COUNT; ++i)
	      disSquare[i] = dtl[i].x*dtl[i].x + dtl[i].y*dtl[i].y + dtl[i].z*dtl[i].z;

        for(int i=0; i<LIGHT_COUNT; ++i)
	      d[i] = max(dot(nn,normalize(dtl[i])),0.0);

        for(int i=0; i<LIGHT_COUNT; ++i)
	      lw[i] =(lightcolor*d[i] + ambient)*ulight[i].w / (1.0 + disSquare[i]);

        vec3 result = lw[0];
        for(int i=1; i<LIGHT_COUNT; ++i)
              result += lw[i];

	return result;
}

Again, would this be faster than static code?

Danny02 · May 20, 2012, 10:21pm

no,
but the code is cleaner and both will compile to the same instructions

Orangy_Tang · May 20, 2012, 10:28pm

You would create and load a texture with your falloff ramp in it, then pass it into your shader in the normal way. Then just look-up into it instead of performing your attenuation calculation.

Danny02 · May 20, 2012, 10:29pm

get rid of one the normalize


...
        for(int i=0; i<LIGHT_COUNT; ++i)
	      d[i] = max(dot(nn, dtl[i] /(sqrt(disSquare[i])),0.0);
...
}

ps:
perhaps you noticed that I changed the name to lambert, because the dot product between the normal and lightdirection is the Lambert Term. Phong describes only how to calculate the specular reflection not the diffuse. Where Blin-Phong should be prefered over standart Phong.

pitbuller · May 21, 2012, 12:40pm


#define LIGHTS_NUM 4
uniform vec4 ulight[LIGHTS_NUM];

varying vec4 v_color;
varying vec3 v_position;
varying vec3 v_normal;


vec3 ambient = vec3(0.2,0.2,0.2);
vec3 lightcolor = vec3(0.6,0.6,0.6);

vec3 lights() {
   
   vec3 light = ambient;
   vec3 nn = normalize(v_normal);  
   for (int i=0;i < LIGHTS_NUM;i++)
   {
   		vec3 dtl = ulight[i].xyz - v_position;
   		float dis = length(dtl);
   		float att = (1.0 / (1.0 + (dis * dis)));
   		float d = max(dot(nn, normalize(dtl)),0.0);
   		light += (att  * d * ulight[i].w) * lightcolor;
	
	}
   return light;
}

void main(){
		gl_FragColor.rgb = lights();

}

This is best that I got. Its use only 12registers instead of 19 that original needed. Try to profile your bottleneck and optimize it after that.