Fill Ellipse from a fragment shader

I always rewriting some old 2D stuffs using old OpenGL glBegin/glEnd for use modern openGL statements.

I’m writing a fillOval implementation using a fragment shader instead of pushing a lot of vertices for drawing my ellipse.

The idea is to paint a classic rectangle (2 triangles composition) and use a fragment shader that discard all fragment that is not inside the ellipse.

As is, i push on OpenGL in order to draw my ellipse :

  • 4 vertices
  • 6 indices
  • 2 “vec2” uniforms
    • the center of the ellipse
    • the radius of the ellipse

my fragment shader (responsible to discard fragments) looks like this one:

#version 330

uniform vec2 u_centerOval;
uniform vec2 u_radiusOval;

layout(origin_upper_left) in vec4 gl_FragCoord;

out vec4 outputColor;

void checkFragment() {
   /**
    * Ellipse equation =  (x/a)2 + (y/b)2 = 1 
    *     a = horizontal radius
    *     b = vertical   radius 
    */
    
    float e1 =  ( gl_FragCoord.x - u_centerOval.x ) / ( u_radiusOval.x );
    float e2 =  ( gl_FragCoord.y - u_centerOval.y ) / ( u_radiusOval.y );
    
    float d  = (e1 * e1) + (e2 * e2);
    if( d > 1 ) {
       outputColor = vec4(1.0 , 0.0 , 0.0 , 1.0 ); // discard;
    }    
}

So, it works not so bad. I have an “aliasing” issue, but i will see this later.

But now, if i paint with this statement:


  g2d.setColor( Color.YELLOW );
  g2d.fillOval( 400 ,  50 ,  200 ,  100 );

i obtain this

But if i apply an affine transformation like this:


 g2d.transform( AffineTransform.identity().rotate( 0.2f , 400 + 100 , 50 + 50 ) );
 g2d.setColor( Color.YELLOW );
 g2d.fillOval( 400 ,  50 ,  200 ,  100 );
g2d.setTransform( null );

i obtain this

That’s quite normal since my fragment shader don’t take in account my affine transform.
It’s only my vertex shader that use it (and it rotate the rectangle as well).

i should inverse my transformation on gl_FragCoord in order to compute the ellipse equation ?
It’s a pain, all matrices are not invertible. I can pass it as an uniform.

Have you a better idea ?
At this point, maybe this technical approach is completely crazy but at first thinking i found it quite fun.

Best regards,
Sébastien.

I’m sorry I don’t have the time to look deep into this, but what I have been told, when trying to “optimize” the circle drawing by drawing a rectangle and determining which pixels to fill in the fragment shader I have been told that simply pushing lots of verticies is faster. You might need to benchmark this. I didn’t test it, but that’s what I have been told :slight_smile:

Generating it will be a lot faster if you use a lot of vertices. The issue you have here is you are performing thousands of expensive operations in your fragment stage when it could simply be just a color set. If you just have 1000 non overlapping vertices that have been generated outside of the shader will be much more efficient.

When using the “vertices” solution, the main problem is to determine how many vertices we should use (depending on the resolution and size of the ellipse).
Using the fragment technics, it’s openGL that determine the “size” of the fragment and you haven’t to worry about.

Another technic is to use by example the Bresenham’s algorithm and you don’t have to worry about determining how many vertices you need. but:

  • when you push only points, it’s quite ok, when you want lines, you need to set vertices in the correct order that is a pain with such algorithm.
  • using the plain old trigo algo is enough but i don’t like to compute cos/sin for every points.

i give a try to the bresenham’s and will make some benchs

Sticking with brute force rendering…the orientation of the bounding rectangle gives you the rotation information.

If you want to do this in a fragment shader, wouldn’t you be better using texture coordinates rather than gl_FragCoord and your uniforms?

https://bitbucket.org/TwoLivesLeft/core/wiki/EllipseShader

I give it a try to the Bresenham algorithm

Some result here (i know there is no reason to be so enthousiast :P)

		
g2d.setColor( Color.YELLOW );
g2d.fillOval( 400 ,  50 ,  200 ,  100 );

		
g2d.setColor( Color.YELLOW );
g2d.transform( AffineTransform.identity().rotate( 0.2f , 400 + 100 , 50 + 50 ) );
g2d.fillOval( 400 ,  50 ,  200 ,  100 );
g2d.setTransform( null );

		
g2d.setColor( Color.YELLOW );
g2d.transform( AffineTransform.identity().rotate( 0.2f , 400 + 100 , 50 + 50 ) );
g2d.drawOval( 400 ,  50 ,  200 ,  100 );
g2d.setTransform( null );


g2d.setPaint( texture ); // set a Texture instead of a color to fill it
g2d.transform( AffineTransform.identity().rotate( 0.2f , 400 + 100 , 50 + 50 ) );
g2d.fillOval( 400 ,  50 ,  200 ,  100 );
g2d.setTransform( null );

Here the implementation using Breshenham.
I just have a difficulty to pre check buffers size of my batcher. So ATM i must check a BufferOverFlowException that is not quite fun…

I don’t do any benchmarks for now but i find this solution much functionnal than the fragment shader (but less fun ^^)

/**
     *  https://sites.google.com/site/ruslancray/lab/projects/bresenhamscircleellipsedrawingalgorithm/bresenham-s-circle-ellipse-drawing-algorithm
     */
    private void paintOvalBresenham(float x0, float y0, float rx, float ry , boolean fill , Graphics2DAttributes g2dAttributes , boolean overflowed ) {
    	/**
    	 * Basics primitives perform pre check on all buffers of this batcher before pushing datas like:
    	 *      // 1 primitive 4 vertices and 6 indices.
    	 *      checkBufferSize( 1 , 4 , 6 );  
    	 *    
    	 * This implementation is quite more complicated and i can't determine how much vertices and indices i will use.
    	 * So, i don't do any pre check but if i have a buffer overflow, i discard this primitive, flush() and restart this paint process
    	 * 
    	 * I can assume at least to have 1 primitive enought
    	 */
    	checkBufferSize( 1  , 0 , 0 ); 
    	
    	/** 
    	 * Get the primitive
    	 */
    	Primitive p = getPrimitive( ( fill ? GL11.GL_TRIANGLE_FAN : GL11.GL_LINE_LOOP ) , g2dAttributes );
    	
		float a2  = rx * rx;
	    float b2  = ry * ry;
	    float fa2 = 4 * a2, fb2 = 4 * b2;
	    
	    /** Bresenham algorithm compute 2 octants and determine 6 others by symetric computations
	     *    - Theses 2 first octants are done without the use of cos/sin operations but use (* +)
	     *    - Theses 6 others octants are done only with translation ( + or - operations) 
	     * 
	     *  The difficulty is able to order vertices in the "indices buffer" sinces points are not computed sequentially...
	     *  
	     *  Which is the order of points computation:
	     *    - Each octant is numbered from 1 to 8. We compute octants 1,2,3,4 (multiplexed) and after we compute octants 5,6,7,8 (multiplexed)
	     *    - Some octants as computed in a clockwise and some others in counterclockwise
	     *    - The drawing process draw octants in this order 1 , 5 , 7 , 3 , 4 , 8 , 6 , 2 in the counterclockwise.
	     *    - Some octant are clockwise and some others in counterclockwise by nature of symetric computation used. 
	     *               - Octant 1 : counterclockwise
	     *               - Octant 5 : clockwise
	     *               - Octant 7 : counterclockwise
	     *               - Octant 3 : clockwise
	     *               - Octant 4 : counterclockwise
	     *               - Octant 8 : clockwise
	     *               - Octant 6 : counterclockwise
	     *               - Octant 2 : clockwise
	     *                 
	     *  
	     *   - Octants topology: 
	     *   
	     *                   ***************
	     *              *****   \ 4 | 3  /  *****
	     *            **    8     \ |  /    7    **
	     *          **             \|/             **
	     *          *---------------X---------------*
	     *          **      6     / | \     5      **
	     *            **         /  |  \         **
	     *              *****  /  2 | 1  \  *****
	     *                   ***************
	     */
	    int sizeOctants1234 = 0;
	    int sizeOctants5678 = 0;
	    
	    try {
		    /* first half */
		    float x, y, sigma;	
		    for (x = 0, y = ry, sigma = 2*b2+a2*(1-2*ry); b2*x <= a2*y; x++)
		    {
		    	p.pushVertex( x0 + x , y0 + y ); // Octant 1
		    	p.pushVertex( x0 - x , y0 + y ); // Octant 2
		    	p.pushVertex( x0 + x , y0 - y ); // Octant 3
		    	p.pushVertex( x0 - x , y0 - y ); // Octant 4
		    	sizeOctants1234++;
		        
		        if (sigma >= 0)
		        {
		            sigma += fa2 * (1 - y);
		            y--;
		        }
		        sigma += b2 * ((4 * x) + 6);
		    }
	
		    /* second half */
		    for (x = rx, y = 0, sigma = 2*a2+b2*(1-2*rx); a2*y <= b2*x; y++)
		    {
		    	p.pushVertex( x0 + x , y0 + y ); // Octant 5
		    	p.pushVertex( x0 - x , y0 + y ); // Octant 6
		    	p.pushVertex( x0 + x , y0 - y ); // Octant 7
		    	p.pushVertex( x0 - x , y0 - y ); // OCtant 8
		    	sizeOctants5678++;
	
		    	if (sigma >= 0)
		        {
		            sigma += fb2 * (1 - x);
		            x--;
		        }
		        sigma += a2 * ((4 * y) + 6);
		    }
	    }
	    catch( BufferOverflowException e ) {
	    	/** Ouch, we have not enough place for put all vertices, we flush this batcher without the primitive
	    	 */
	    	if( overflowed ) throw new IllegalStateException("OpenGLSupport with too small buffer size for this method");
	    	primitiveCount--; // discard this primitive
	    	flush();
	    	paintOvalBresenham(x0, y0, rx, ry, fill, g2dAttributes , true);
	    	return;
	    }
	    
	    /** Now we put all vertices, we know how many indices we must have
	     *  We can do a pre check in order to avoid the BufferOverflowException...
	     */
	    if( ! checkBufferSize( 0 , 0 , ( 4 * sizeOctants1234 ) + ( 4 * sizeOctants5678 )  , false ) ) {
	    	if( overflowed ) throw new IllegalStateException("OpenGLSupport with too small buffer size for this method");
	    	primitiveCount--; // discard this primitive
	    	flush();
	    	paintOvalBresenham(x0, y0, rx, ry, fill, g2dAttributes , true);
	    	return;
	    }
	    
	    /** Push indices
	     */
	    IntBuffer buffer = indicesBuffer.getBuffer();
	    
	    final int strideOctants1234 = 4 * sizeOctants1234;
	    final int strideOctants5678 = 4 * sizeOctants5678;
	    
	    /** 
	     * Computing where start each octants
	     */
	    final int offsetOctant1 = 0;                                                      // forward
	    final int offsetOctant4 = 3;					                                  // forward
	    final int offsetOctant6 = strideOctants1234 + 1;  						          // forward
	    final int offsetOctant7 = strideOctants1234 + 2; 					              // forward
	    final int offsetOctant5 = strideOctants1234 + strideOctants5678 - 4;              // backward
	    final int offsetOctant3 = strideOctants1234 - 2; 						          // backward
	    final int offsetOctant8 = strideOctants1234 + strideOctants5678 - 1;	          // backward
	    final int offsetOctant2 = strideOctants1234 - 3;								  // backward
	    
	    // octant 1
	    int offset = offsetOctant1;
	    int stride = 4;
	    for(int i = 0 ; i < strideOctants1234 ; i += stride ) {
	    	buffer.put( offset + i );
	    }
	    // octant 5
	    offset = offsetOctant5;
	    for(int i = 0 ; i < strideOctants5678 ; i += stride ) {
	    	buffer.put( offset - i );
	    }
	    // octant 7
	    offset = offsetOctant7;
	    for(int i = 0 ; i < strideOctants5678 ; i += stride ) {
	    	buffer.put( offset + i );
	    }
	    // octant 3
	    offset = offsetOctant3;
	    for(int i = 0 ; i < strideOctants1234 ; i += stride ) {
	    	buffer.put( offset - i );
	    }
	    // octant 4
	    offset = offsetOctant4;
	    for(int i = 0 ; i < strideOctants1234 ; i += stride ) {
	    	buffer.put( offset + i );
	    }
	    // octant 8
	    offset = offsetOctant8;
	    for(int i = 0 ; i < strideOctants5678 ; i += stride ) {
	    	buffer.put( offset - i );
	    }
	    // octant 6
	    offset = offsetOctant6;
	    for(int i = 0 ; i < strideOctants5678 ; i += stride ) {
	    	buffer.put( offset + i );
	    }
	    // octant 2
	    offset = offsetOctant2;
	    for(int i = 0 ; i < strideOctants1234 ; i += stride ) {
	    	buffer.put( offset - i );
	    }
	    p.end();
	}

Sébastien.

The easiest way to do this in a fragment shader is to NOT rely on gl_FragCoord. If you rotate the coordinates, the result stays the same. A better idea is to assign “texture” coordinates to each corner and have the shader rely on those instead, since those coordinates will be and interpolated correctly after rotation.

Another way would be to use rotated ellipse equation but that would be just nuts. http://www.maa.org/external_archive/joma/Volume8/Kalman/General.html

Thanks all, it’s the right way with the fragment shader. I don’t found it alone… damn :stuck_out_tongue:
It’s not yet naturals for me to write shaders and understood all subtles with theses !!!

The most hard things with openGL is the debugging !! .
When it don’t work as you wish, it’s a reallly a painfull to find the cause.

Thanks all, for your advices, but il will stick with the Bresenham implementation.

With modern GPU’s I doubt that nothing beats pixel shader variant in speed for dynamics ellipses. If you try to use a lot of triangles you end up with badly shaped thin/small triangles that are not efficient to rasterize. Baking shape to texture does not work well for small or really big shapes.(you also need special distance texture instead of pure shape. With alpha test like shader.). Also nothing can beat analytical anti-aliasing.


void checkFragment() {
   /**
    * Ellipse equation =  (x/a)2 + (y/b)2 = 1 
    *     a = horizontal radius
    *     b = vertical   radius 
    */
    
    vec2 e =  ( v_texCoord - 0.5 ) * u_invRadius.xy; // can be moved to vert shader.
    
    float d  = dot(e, e);
    if( d > 1.0 ) {
       outputColor = vec4(1.0 , 0.0 , 0.0 , 1.0 ); // discard;
    }    
}

So optimized code is just one vec2 dot operation and logic for handling what to do with the result.

All the information is in the rectangle…didn’t I say that already. :wink:

i’m completing the fragment shader from advices.

The result is here :

The shader look’s like

#version 330

varying vec2 v_boundingBoxPosition;

void checkFragment() {

   /**
    * Ellipse equation =  (x/a)2 + (y/b)2 = 1 
    *     a = horizontal radius
    *     b = vertical   radius 
    *
    *
    * Bounding box is a vec2 that correponds to this rectangle coordinate 
    *        (-1,-1)            (+1,-1)
    *                  (0,0)
    *        (-1,+1)            (+1,+1)
    *
    *  The varying result in this fragment shader correspond to [ (x/a) , (y/b) ]
    *  We can perform a dot() for doing the pow(2) 
    */
    float d  = dot( v_boundingBoxPosition , v_boundingBoxPosition ); 
    if( d > 1 ) {
       discard;
    }    
}

The fragment result looks much nicer.

But in term of performance, i bench this for 600 ellipses per frame (i use large buffer for prevent implicit flush while pushing datas):

  • Bresenham : 10 ms for pushing datas on the batcher. 6.8 ms for flush it
  • Shader : 0.1 ms for pushing datas on the batcher. 3.5ms for flush it.

In both of them i take the worst case but in average it’s most:

  • Bresenham : 10 ms for pushing datas on the batcher. 6 ms for flush it.
  • Shader : 0.1 ms for pushing datas on the batcher. 2.8 ms for flush it.

I must check if i have not some performance issues in the push data method. but in all cases, the flush is also faster in the shader version.

So, ATM the shader way seems the way to go…

EDIT: The bresenham algorithm push 6 floats per vertices (color + 2d position) that mean 24 bytes per vertices
The ellipse drawn use 584 vertices that means 14 016 bytes for an ellipse (14 Ko)
When i draw 600 ellipses, it’s come up to 8 409 600 bytes (8.02 Mo).
the FloatBuffer.put(...) take me the 50% of total time of the push sequence. It seems strange that it’s longer to fill it in RAM than sending it to the GPU…

EDIT : I updated the fragment shader, i don’t use anymore an uniform. the varying vec2 is ready to use and minimize computation inside the fragment shader like pitbuller said.

The shader version is beautiful at it simplicity.

with AA :

#version 120

varying vec2 v_boundingBoxPosition;

void main() {
    
    float d  = dot( v_boundingBoxPosition , v_boundingBoxPosition ) * 0.5; 

    //
[quote=""]
    // fwidth was a bad idea for a function...we should all pretend like it doesn't exist.
    //
[/quote]
   // float edge = fwidth(d);

    vec2 pw = fwidth(v_boundingBoxPosition);
    float edge = max(pw.x, pw.y);
 
    // float alpha = step(0.5, d);
    float alpha = smoothstep(0.5 - edge, 0.5 + edge , d);

    //
[quote=""]
    // You still want to do a discard so the depth buffer isn't written to.
    //
[/quote]
    
    //
[quote=""]
    // Using equality with floating points is a bad idea
    //
[/quote]
   
    // if(alpha == 0.0) discard;
    if(alpha < 0.01) discard;

    gl_FragColor = vec4(1.0,1.0,0.0, alpha);

}

You still want to do a discard so the depth buffer isn’t written to.

fwidth was a bad idea for a function…we should all pretend like it doesn’t exist.

discard on this type of AA isn’t useful…has to be back to front anyway.

yea, it’s not msaa. still better then nothing, even with zero alpha.

Using equality with floating points is a bad idea :point:


if(alpha <= 0.01) discard; // much better