GLSL speed pitfall ?

Bonbon-Chan · March 17, 2010, 10:22am

Here I go again, I came back on my shape drawing with lwjgl.
But this time no triangulor, I try to use shader instead. My main point is to have antialiasing and a “good” speed. I use an Intel G45 for my test (I was surprised to see that this card has shaders !!!)

To do so, I fill a lookup texture with all segment for one polygone. Then I use a shader to see if the point is in the polygon.

This is my first time with a real shader but everything go right. There is still a bug and the antialiasing is wrong. But this is far slower than expected :-. I have tried with an “empty” shader, I have around 150fps. With the shader, 5 fps… With some googling, someone suggested to remove all “if” branch, but it don’t give mush speed increase.

Is it pointless to try to make it run on a G45 or is there something wrong with my way of doing it ?

The shader for a simple even-odd filling polygone :

varying vec2 coord;
varying vec4 color;

uniform sampler2D shape;

vec2 p = vec2(0,0);

float getPoint(float i)
{
  p.x = floor(mod(i+0.0,1024.0))/1024.0;
  p.y = floor((i+0.0)/1024.0)/8.0;
  vec4 c = texture2D(shape,p.xy);

  float a1 = floor(0.5 + c.x * 255.0);
  float a2 = floor(0.5 + c.y * 255.0);
  float a3 = floor(0.5 + c.z * 255.0);
  float a4 = floor(0.5 + c.w * 255.0);

  return (a4>=255.0)? -1.0 : (((((a4*256.0)+a3)*256.0+a2)*256.0)+a1)/1024.0;
}

void main()
{
  
  vec2 n = vec2(0,0);
  
  float alpha = 0.0;

  int compt = 0;

  float x1;
  float y1;
  float x2;
  float y2;

  float a1,a2,a3,a4;

  for(float i=0.0;i<1024.0*4.0;i+=4.0)
  {
    x1 = getPoint(i);
    if(x1<0.0) { break; }
    y1 = getPoint(i+1.0);
    x2 = getPoint(i+2.0);
    y2 = getPoint(i+3.0);

    n.x = (x2-x1);
    n.y = (y2-y1);

    float dx = (coord.x-x1);
    float dy = (coord.y-y1);

    float l = sqrt(n.x*n.x+n.y*n.y);
    n = n/l;

    float tan = dx*n.y-dy*n.x;
    float nor = dx*n.x+dy*n.y;

    tan = (tan<0.0) ? -tan:tan;

    float sig = (y2-y1);
    float c1 = (coord.x-x1)*sig;
    float c2 = (x2-x1) * (coord.y-y1);

    int d1 = ( c1 >= c2 )? 1-compt : compt;
    int d2 = ( c1 <= c2 )? 1-compt : compt;
    int temp = ( sig >= 0.0 )? d1 : d2;
    compt = ((coord.y>=min(y1,y2))&&(coord.y<=max(y1,y2))) ? temp : compt;
    float al = ((nor>=0.0)&&(nor<=l)) ? max(1.0-tan,0.0) : 0.0 ;
    alpha = max(al,alpha);
  }

  alpha = (compt>0) ? 1.0 : alpha;
 
  gl_FragColor = vec4(color.r,color.g,color.b,alpha);
}

Orangy_Tang · March 17, 2010, 10:40am

Er, unless I’m reading your code wrong you’re doing 4096 texture lookups, 10240 branches, 24576 floor() calls and a boat load of math per pixel. :o That’s never going to run at a decent speed, and frankly I’m amazed you get 5fps.

I think you’re going to have to explain and change your algorithm, because a loop that big and heavy is never going to fly. I’m guessing you’re trying to scan-convert your polygons manually in the pixel shader, which makes no sense to me when the graphics card already has highly specialised hardware to do exactly that.

bleb · March 17, 2010, 11:03am

You might want to look at this

Bonbon-Chan · March 17, 2010, 1:33pm

Yes, at worst case, it is 4096 lookup. I stop before when I reach the last segment. In the texture, 1 color is 1 float value. I store all segment of the polygon (x1,y1,x2,y2), (…), …
Then I test if the point is in the polygon. I manage to get 5 fps since I only draw on the bounding of the polygon and not the fullscreen.

All the floor are to remove rounding error when converting the raw data (4 byte) to the orignal data (fixe point) throught float.
The Intel G45 don’t support float texture :-. Well I have give a try without the floor… no speed change ???

[quote=“Orangy Tang,post:2,topic:35041”]
Hardware for antialising filled polygons ? I know none. What I want there is to avoid polygon conversion and triangulator on CPU. And I want to add radial gradiant filling at some point.

Don’t point AAA for antialising, G45 don’t have it and when I use it on NVidia card it scraw up what I want to do (I don’t find how to stop it and restart it in the middle of the draw with LWJGL).

I know this. It was the start of my word but :

it use triangulator
there is lot of polygon transformation
it only deals with cubic curve (quadratic one are soooo complicated)
antialiasing is horrible :(. It is nice with low radius curve but not with high radius curve.

When I see people speeking about GPU ray casting voxel, I wonder why it is soo difficult for 2D polygons :’(
How people doing ray casting do ? There is a big loop too, isn’t it ?
It is my first really shader so I don’t really know what is the limit and what to expect.

Orangy_Tang · March 17, 2010, 1:44pm

All but the most advanced graphics cards will just unroll the loop, so odds are you’re executing the entire 1024 step loop regardless of the actual input data.

[quote]Hardware for antialising filled polygons ? I know none. What I want there is to avoid polygon conversion and triangulator on CPU. And I want to add radial gradiant filling at some point.

Don’t point AAA for antialising, G45 don’t have it and when I use it on NVidia card it scraw up what I want to do (I don’t find how to stop it and restart it in the middle of the draw with LWJGL).
[/quote]
If you don’t want to use fullscreen AA for some reason, then the best option i think would be something like zShapes where you’d tessalate on the CPU and then run an antialiasing shader on the edge polys. This way your shader would be much simpler (fixed single edge antialiasing rather than dealing with many edges) and it’ll only run on a fraction of the pixels.

Basically I think your entire algorithm is poorly suited to running on a graphics card and you need to rethink some of your core assumptions and come up with a more practical approach. Pixel shaders need carefully massaged input data so they only have to perform the bare minimum amount of work - you seem to be just blatting the raw input data to the graphics card and crowbaring a general purpose scanline conversion algorithm into a pixel shader (which means tons of duplicated calculations).

Spasi · March 17, 2010, 2:05pm

I’d just like to point out that “triangulation” in zShapes means converting a curve segment to a single triangle. The amount of information you submit to the GPU for the triangle is comparable to that of submitting the segment itself, so it’s not like you pay any price for complex shapes.

I don’t understand your point about “lots of polygon transformation”. For quadratic curves, you could probably do a quadratic to cubic conversion and then pass the cubic curve through the triangulator (not sure if that would match your quality requirements). I agree that antialiasing in zShapes is crap, I’m not even sure I did the correct math there, I’d advise anyone that tries to use it to have a good look at it.

jezek2 · March 17, 2010, 3:30pm

I think that’s because you use ddx/ddy funcs, these seems to be unprecise, at least when I fiddled with zShapes before (on GF6). I also tried some faking of ddx/ddy using textures by following some nvidia paper, but it was slightly worser. While in 3D scene you can get away with lower quality AA (like 4x), for 2D it’s very obvious and you need like 16x to be usable (even more if possible to make it even more smooth).

For my usage of Graphics2D implementation in OpenGL I ended up with bruteforce stencil buffer based approach (+ antialiased edges using pixel shader), as it has lowest setup cost and works for any polygon, though it doesn’t handle well AA for small details. But for something like SVG I think it would suck performance-wise. zShape seems like great approach for that, just need finding of some way how to improve the antialiasing.

Orangy_Tang · March 17, 2010, 3:37pm

I’ve found that ddx/ddy produces highly variable results depending on hardware. When I was using them on a GF8 they gave great quality results, but running on a GF6 the quality was crap, it looked like the output only had about 4 bits of precision.

jezek2 · March 17, 2010, 3:42pm

I suspect that ddx/ddy is tightly tied with mipmapping. On pre-GF8 HW there is quite big approximation, whereas GF8 and newer are quite precise. See this comparasion.

Bonbon-Chan · March 17, 2010, 4:13pm

Thanks everyone.

It is just a no-go for this solution. Is it for sure not the good way of thinking for shaders… I will find out something else ;D
A least, I have now a far better knowledge of shaders and a clearer idea of what can and can’t be done with them.

[quote]If you don’t want to use fullscreen AA for some reason, then the best option i think would be something like zShapes where you’d tessalate on the CPU and then run an antialiasing shader on the edge polys. This way your shader would be much simpler (fixed single edge antialiasing rather than dealing with many edges) and it’ll only run on a fraction of the pixels.
[/quote]
It can give some results when there is no small area but when it ends with long thin corner it doesn’t work.

First I didn’t find a way to transform a quadratic curve to (at least I think) 3 cubic curves.
By polygon transformation, I mean that you have to test if a curve cuts another… then add segments… search for hole that cuts outer polygon… then add segments… And with my (first) triangulator, to deal with hole I have to add segments beetween outer and iner polygons. All this takes even more time than triangulator itseft :-\

The more I search, the more I’m amazed by the fact that modern card can’t simply do something like that