aren't simple shaders supposed to be faster?

I finally got around to adding glsl support to my graphics code and it seems that the shader programs give me worse framerate. With a fixed function pipeline, I could get about 18 fps drawing 10,000 cubes, while using the shader it would give me only 12.5. The shader consited of this:

vertex shader:
void main() {
gl_FrontColor = gl_Color; 
gl_Position = ftransform();
}
fragment shader:
void main() {
gl_FragColor = vec4(.2, .4, .5, 1);
}

is there anything I need to do to make shaders run faster, or was my thought that they ran faster incorrect?

i might be completely wrong on this, but i think the shader you apply does nothing more than set an additional color per fragment.
so it would actually add an overhead.

Yes a simple shader program like that should be faster, especially since the fixed function pipeline has been done in the shader units since the introduction of SM2.0 cards.

I would be inclined to think the culprit is that you introduced additional api calls when adding the shader program, are you setting uniform variables or calling glUseProgram frequently?

Posting the drawing code might be an idea. Are you sure you’re not compiling the shader and/or rebinding it all the time (which wouldn’t happen with FF).?

I only compile the shader at startup. glUseProgram() is called twice per frame (once to set the custom shader, once to revert to fixed-function pipeline). I profiled the code with and without the shader added. When the shader is used, glDrawElements() runs about 1.4 times slower than glDrawElements() with FF. This was on a macbook pro with a geforce 8800. I plan on testing it on a pc soon and will post results then.

Odd. The only thing I can suggest is checking what format your vertices are being processed in. It could be that the shader version is being calculated at a higher degree of precision than the FF version is using, but I would have expected ftransform() to avoid such differences.

What happens to performance if you remove…

gl_FrontColor = gl_Color;

???

I would suspect that switching between the custom shader and the fixed funtionality pipeline could involve costly state changes by the GL backend. I would not expect this, since it should only affect the used shader program, but you never know. For a test, profile without reverting to the fixed functionality pipeline by just calling glUseProgram() on startup. If this shows a significant difference, you could try to switch between two custom shaders for gathering the performance hit of a simple program change to verify the assumption, that reinitializing the fixed functionality eats up the performance.

Regrettably I’ve tried this and there seems to be little/no performance difference between invoking glUseProgram() each frame or just once. This leads me to believe that there is some specialized rendering path for the 8800 that favors the FF pipeline.

This seems to only add about 1 fps better performance so I don’t think it is the cause

I don’t think I’ve heard about this, how would I go about looking into it?

I tested my program on a windows xp pro with an ati x700 and the results were qualitatively the same: 26 fps with FF, 16 with a simple shader for 10,000 cubes.
I’m at a loss for why this is happening so I’ve posted my code that initializes the shaders in the hopes that someone notices something wrong.

Shader object creation/compile code:


		this.shader_id = gl.glCreateShader(this.type);
		gl.glShaderSource(this.shader_id, 1, new String[] {this.source}, new int[] {this.source.length()}, 0);
		gl.glCompileShader(this.shader_id);
		
		gl.glGetShaderiv(this.shader_id, GL.GL_COMPILE_STATUS, temp, 0);
		if (temp[0] == GL.GL_FALSE)
			throw new RuntimeException("ShaderObject failed to compile, type=" + (this.type == GL.GL_VERTEX_SHADER ? "GL_VERTEX_SHADER" : "GL_FRAGMENT_SHADER") + ": \n" + this.info_log);

Shader program compile code:

             this.program_id = gl.glCreateProgram();
			for (int i = 0; i < this.shaders.length; i++) {
				gl.glAttachShader(this.program_id, this.shaders[i].getID());
			}
			gl.glLinkProgram(this.program_id);
			int[] temp = new int[1];
			gl.glGetProgramiv(this.program_id, GL.GL_LINK_STATUS, temp, 0);
			if (temp[0] == GL.GL_FALSE)
				throw new RuntimeException("ShaderProgram failed to link correctly");

Using the program:

gl.glUseProgram(this.program_id);

You probably should compile a small test app, zip it and post for download.

26 FPS with x700 ? with or without shader it seem you have already a problem you should have a lot more FPS with 10000 cube ,no?

EDIT: could you give us cpu usage while your demo is running with shader and without? it should no be increased when you add shader, if so there is soemthing wrong with it

More troubling is the X700 beating the 8800 by so much. There’s very little chance that this test is GPU limited and as a result that the shader itself is playing a part.

its an 8800 mobile if it makes a difference. And yes the reason for the most of the bottleneck is that it is cpu limited. If I had bundled all 10,000 cubes into one object at start up, it’d perform much better. However, currently it is essentially a simple scene graph with 10,000 leaf nodes and no frustrum culling, so each frame all 10,000 have to be visited. My main concern was not so much the overall speed (since raw GLEventListener paradigm with 10,000 separate cubes performs similarly) but that the shader appeared to slow things down.
According to my profiler the glDrawElements() takes longer and that was the only significant increase to explain the difference in performance.

Is there something special going on inside jogl’s glDrawElements() method that does something extra if a custom shader is active?

No.