Cg shaders with LWJGL

Can someone give me a quick tutorial or instructions to using Cg shaders in LWJGL?

I’m currently using GLSL but I can see that might be a mistake, so I intend to switch to Cg. Do I need to pre-compile my Cg shaders to assembly before passing them into glShaderSourceARB()?

Thanks!

[quote=“anarchotron,post:1,topic:24073”]
Using GLSL might be a mistake? Why is that? (There are a couple of reasons, I just want to hear your opinion)

[quote=“anarchotron,post:1,topic:24073”]
No, glShaderSourceARB is a GLSL function, you can only pass GLSL shaders there. Well, you can pass Cg shaders too, but only on NV cards (if EXT_Cg_shader is present).The best way to use Cg with LWJGL would be to compile your shaders to assemblies and use the ARB_vertex/fragment_program interface. Which sucks IMO, even though something similar has been done in Half-Life 2 (they had thousands of precompiled low-level DirectX shaders - very difficult to manage).

[quote=“Spasi,post:2,topic:24073”]

Several reasons:

  • Cg (and HLSL which is extremely similar) is very well documented and heavily used online. That is, it is easier to find help with Cg/ HLSL than with GLSL. For instance, googling gives there results (super un-scientific I know, but an indicator nonetheless):

“glsl shader forum” 939 hits
“hlsl shader forum” 8,500 hits
“cg shader forum” 56,000 hits

Similar searches on amazon.com yields similar results.

  • Cg has a nice FX framework which I assume can be integrated with Java, thus allowing me to delete hefty amounts of personal code to do this very thing.

  • Cg has a nice frontend compiler that will actually output the assembly, allowing you to analyze performance problems as well as getting simple information like instruction counts.

  • I fear that the current driver support of GLSL is very poor. I’m getting awful performance binding the simplest of fragment programs. I haven’t fully investigated this theory, but myself and some co-workers feel that this may be the case.

So, in a nutshell, I feel that the current level of support (technology, literature, community) for GLSL is inferior to that of Cg and HLSL. Furthermore, I feel that said level of support is not going to surpass that of Cg/HLSL going forward. I feel that GLSL is a nice definition, but already obsolete.

Comments please!

[quote=“anarchotron,post:3,topic:24073”]
I disagree.

GLSL is not used as much as Cg and HLSL, but that doesn’t say anything about the language or the API. GLSL is also well documented, the language and API specs are very well written. As for help, questions (e.g. in forums) are rarely related to the language or the API (both are very simple in all three slangs). They have more to do with shading techniques, which apply equally well to all three of them (porting a Cg or HLSL shader to GLSL is easy).

[quote=“anarchotron,post:3,topic:24073”]
I agree.

But the ARB is already working on an FX framework for GLSL. I’m not sure of its status, but it will be done. I’m also not sure of its usefulness. I’ve never used an FX framework, but I’ve heard about certain bugs and limitations that make your life harder. A custom framework can be a lot more versatile and better integrated with your engine (e.g. expose or not expose anything you like). With scripting, it can be more powerful too. Yeah, it’s a lot of code and work though. :wink:

[quote=“anarchotron,post:3,topic:24073”]
I disagree.

For NV cards, the functionality is already there. Check out NVShaderPerf (for performance metrics) and NVEmulate (for assembly output and emulation modes). You may also use the Cg compiler to compile GLSL shaders (actually, the NV driver does this internally too).

[quote=“anarchotron,post:3,topic:24073”]
Hmmm.

I never had performance problems with NV cards. I’ve been actually impressed many times by the driver’s assembly output, it’s doing some crazy optimizations. With ATI cards, hitting software mode is easy but the driver is getting better and better. Also, when everything’s fine, performance is reasonable.

I was skeptic in the past, but in the end it’s just like Java, you’ve got to let go and trust the compiler. A newer driver may boost your performance with no effort and that doesn’t apply to Cg & HLSL. Even if you’re sure the compiler is being stupid with a particular shader, you can always replace it with a hand-written low-level shader (but that needs a more general renderer).

[quote=“anarchotron,post:3,topic:24073”]
I agree that HLSL is moving faster. But that’s also true for DX vs OpenGL in general.

A couple of facts for Cg: NVIDIA is actually not recommending Cg for DX. For OpenGL, I’ve noticed a general lack of progress after GLSL was released, but Sony’s slang choice for PS3 was Cg, so that may improve things.

Nevertheless, Cg still doesn’t have the cross-platform advantage of GLSL. Cg will never manage to produce good low-level code for ATI or 3DLabs cards. They just don’t know the hardware. GLSL has a decent language, a good API, an FX framework will be done, additional language features will be added in the future (I particularly like Cg’s interfaces) and it is the standard. I agree, it is too young right now, but I see no strong reason to switch to Cg. If your target is to release your engine/game in, say, a year from now, my advice is this: Stick to GLSL.

Good points.

My primary motivation is the performance problems I am experiencing. It may be because of my ATI Radeon 9800 hardware. Is there any way to analyze the driver to see if its doing something like dropping part of the pipeline to software?

I’m using a flag to enable program validation (glValidateProgram) and to print the info log of each linked program. ATI’s driver returns a message on whether it will run on hardware or not. The safer way to know this though, is seeing a sudden drop to 2-5 fps. :wink:

There are other things that may affect shader execution (even if the shader would run fine on your hardware), like using gl_FragCoord in a fragment shader when you also have depth bias enabled (or AA lines/points, wide lines/points, line/polygon stipple). That’s kind of extreme though.

Are the performance problems general, or is it a particular shader (that you may post here if you want)?

I did the validation thing and they are all running in hardware mode. I am using a super-simple fragment shader with ambient+diffuse+specular lighting, a single texture, and range based linear fog.

I know I am fill limited because my framerate changes dramatically with the screen size.

As a baseline:

1280x960
18k triangles (in 33 vertexarray, indexed primitives)
1 texture (2 Mb)

Using shader as seen below (ambient+diffuse+specular, textured, fogged): 59 FPS (17 ms)
Shader only render ambient: 500 FPS (2 ms)
Shader only render ambient+diffuse: 460 FPS (2 ms)
Shader only render ambient+diffuse+specular: 200 FPS (5 ms)
Shader only render ambient+diffuse+specular, texured: 160 FPS (6 ms)

So looking at the differences in framerate, I can estimate that the following operations take:

diffuse: < 1 ms
specular: 3 ms
texture: 1 ms
fog: 11 ms

This seems like an absolutely ridiculous cost for fogging, based on my code below. The mix() function is the culprit. If I, say, fog-to-black by simply multiplying my output color by (1.0 - fog), then the cost is not measurable.

Maybe something is wrong with the way my ATI drivers implement mix(). However, even when I implement my own vector lerp, like this:


vec4 lerp(in vec4 a, in vec4 b, float t)
{
	vec4 result;
	
	result.x = (b.x - a.x) * t + a.x;
	result.y = (b.y - a.y) * t + a.y;
	result.z = (b.z - a.z) * t + a.z;
	result.w = (b.w - a.w) * t + a.w;
	
	return result;
}


It costs the same as mix(). Arrrrggghh the humanity!!! 8 ADD, 4 MUL per pixel should not cost 11 ms!

Here’s my shaders:



-----------------------------------------------------------------------------------------

varying vec3  normal_c,lightDir_c, halfVector_c;
varying float fog;

void main()
{	
	//-- lighting setup
	normal_c = normalize(gl_NormalMatrix * gl_Normal);	
	lightDir_c = normalize(vec3(gl_LightSource[0].position));
	halfVector_c = normalize(reflect(-lightDir_c, normal_c));//gl_ModelViewMatrix * gl_LightSource[0].halfVector).xyz;
		
	gl_TexCoord[0] = gl_TextureMatrix[0] * gl_MultiTexCoord0;
	gl_Position = gl_ModelViewProjectionMatrix * gl_Vertex;
	
	float fogEnd = 1000.0;
	float fogStart = 10.0;
	
	//-- spherical fog, linear falloff
	vec3 pos_c = vec3(gl_ModelViewMatrix * gl_Vertex);
	gl_FogFragCoord = length(pos_c);
	float fogScale = 1.0 / (fogEnd - fogStart);
	fog = (fogEnd - gl_FogFragCoord) * fogScale;
	fog = 1.0 - clamp(fog, 0.0, 1.0);
}

-----------------------------------------------------------------------------------------

varying vec3 normal_c,lightDir_c, halfVector_c;
varying float fog;
uniform sampler2D texture0;

void main()
{	
	vec4 diffuse = gl_LightSource[0].diffuse * gl_FrontMaterial.diffuse;
	vec4 ambient = gl_LightSource[0].ambient * gl_FrontMaterial.ambient + gl_LightModel.ambient * gl_FrontMaterial.ambient;
	vec4 specular = gl_LightSource[0].specular * gl_FrontMaterial.specular;

	//-- ambient
	vec4 color = ambient;

	//-- diffuse	
	float diffuseContribution = max(dot(normal_c, lightDir_c), 0.0);
	color += diffuse * diffuseContribution;

	//-- specular highlight
	float specularContribution = max(dot(normal_c, normalize(halfVector_c)), 0.0);
	color += specular * pow(specularContribution, 16.0);

	//-- texture contribution
	vec4 texel0 = texture2D(texture0,gl_TexCoord[0].st);
	color *= texel0;

	//-- vertex fog
	vec4 fogColor = vec4(0.8, 0.9, 1.0, 0.0);
	color = mix(color, fogColor, fog); 	

	gl_FragColor = color;	
}

I suppose it is possible that my scene is rendering way more fragments than I think, or that it’s doing something really stupid, or always rendering back to front, or something else lame like that. Is there any way to find out how many fragments are being processed per frame?

Thanks!

I’m not particularly up to date with optimising shaders, but your lerp() looks somewhat awkward. Can’t you do:

result = (b - a) * t + a;

Identical result yet that should parallise better on the GPU. Also, your last two lines of fragment shader don’t appear to do anything as youve already set the frag colour? A long shot, but have you tried manually inlining the lerp call and seeing how that changes things?

You can use Hp_occlusion_test to manually count how many fragments are being rendered for any given geometry.

Whoops, that gl_FragColor call was put there when I was testing the speed without fogging. My bad, I’ll edit the post to fix it :slight_smile:

Yes and my lerp() could probably work as you described.

I’ll check on HP_occlusion_test, thank you!

  1. You’d better use ARB/NV_occlusion_query instead of HP_occlusion_test.

  2. GLSL’s mix() works fine, use it.

  3. You don’t need to normalize gl_LightSource[0].position, since it’s a directional light (apparently) and should be already normalized.

  4. The halfVector_c computation produces the Phong reflection vector, not the Blinn half-angle vector (half-way between the camera and light direction vectors). Are your specular highlights showing properly?

  5. I don’t remember why exactly, but the fog coordinate should be clamped in the fragment shader, after interpolation, to get correct results in some cases.

  6. Try not to use gl_FogFragCoord. It’s one of the problems in ATI’s GLSL implementation that never seem to go away. You’re not using it in the fragment shader anyway, so just replace it with a local float. I can’t be sure, but this may be the 11ms problem you’re seeing.

  7. Check out gl_LightModelProducts and gl_LightProducts in the GLSL specification. They should replace those multiplies in the first 3 lines of the fragment shader.

  8. You should modulate the texture before adding the specular, that’s how it’s done usually.

Tell me if you see any improvement. :slight_smile:

I may be wrong, but doesn’t ARB/NV query only return rasterised/not rasterised, whereas HP’s original extension return (somewhat later) with an actual fragment count?

From the NV_occlusion_query spec:

[quote]The HP extension has two major shortcomings.

- It returns the result as a simple GL_TRUE/GL_FALSE result, when in
  fact it is often useful to know exactly how many pixels passed.
- It provides only a simple "stop-and-wait" model for using multiple
  queries.  The application begins an occlusion test and ends it;
  then, at some later point, it asks for the result, at which point
  the driver must stop and wait until the result from the previous
  test is back before the application can even begin the next one.
  This is a very simple model, but its performance is mediocre when
  an application wishes to perform many queries, and it eliminates
  most of the opportunites for parallelism between the CPU and GPU.

This extension solves both of those problems.

[/quote]