How to compile GLSL using fp40 for nvidia 6800?

I’m experimenting w/ some GPGPU concepts on my GeForce6800. I was wondering how I can make JOGL use fp40 when compiling GLSL. It appears it’s doing fp30’ish loop unrolling - exceeding allowable instruction count. The simple code below will return a frag value stuck @ 254. However, my 6800 is supposed to handle fp40 code - which can be compiled using “cgc.exe -oopengl -profile arbfp40” . I did the following work around, but its actually 2-3x SLOWER than the CPU.

Original:


void main() {
   for (int i = 0; i < 512; ++i) {
      gl_FragColor = i;
   }
}

Apparent Work Around:


void main() {
   int k = 0;
   for (int i = 0; i < 2; ++i) {
         for (int j = 0; j < 256; ++j) {
            gl_FragColor = k;
           ++k;
        }
   }
}

GeForce 6 was first generation which supported dynamic loops and branches in pixel shader. I’ve tested it on GF 6600 GT on various demos (eg. from Humus or my own tests) and it was often slower than using unrolled version. While your card is more powerful it will probably have similar characteristics on this. The GLSL compiler probably just choosed unrolled version as it’s faster.