Alpha blending

Dx4 · July 20, 2011, 5:39am

ive been writing a software renderer, but it seems like alpha blending isnt working so well…

algorithm for porter-duff SrcOver:


	void alphaBlend(int [] pixels, int offset, int source, int alpha) {
        int destRGB = pixels[offset];
		int destA = destRGB >>> 24;
        int destR = (destRGB >> 16) & 0xff;
        int destG = (destRGB >> 8) & 0xff;
        int destB = destRGB & 0xff;
        int srcA = source >>> 24;
        int srcR = (source >> 16) & 0xff;
        int srcG = (source >> 8) & 0xff;
        int srcB = source & 0xff;

        srcA *= alpha;
        srcR *= alpha;
        srcG *= alpha;
        srcB *= alpha;

        int oneMinusSrcA = 0xff - (srcA >> 8);

        destR = (srcR + destR * oneMinusSrcA) >> 8;
        destG = (srcG + destG * oneMinusSrcA) >> 8;
        destB = (srcB + destB * oneMinusSrcA) >> 8;
		destA = (srcA + destA * oneMinusSrcA) >> 8;
        //pixels[offset] = (destA << 24) | (destR << 16) | (destG << 8) | destB;
		pixels[offset] =
				((destA << 24) | (destR << 16) | (destG << 8) | destB);
	}

pixel = array of ARGB pixels
offset = position of pixel in array to blend
source = color in ARGB format to blend with destination (pixel[offset])
alpha = alpha to take into account while blending (between 0-255)

the problem is that sometimes it doesn’t blend the alpha properly and instead just replaces the destination alpha with source alpha, can anyone see why this happens?

it happens when the source color has a very small alpha value eg: 1

thanks

nsigma · July 20, 2011, 8:06am

I’ve also written my own software renderer. Just had my first morning coffee so it’s a bit early for bitshifting - let’s hope this is right … ;D

This bit looks wrong. Assuming your alpha value is in the range 0…255, then that needs to be shifted down.

eg.


  srcA = (srcA * alpha) >> 8;

In my code I’m also doing the equivalent of


  srcA = (srcA * (alpha + 1)) >> 8;

which seems to be more accurate.

The second issue may be that Porter-Duff algorithms expect pre-multiplied colour data. You haven’t mentioned whether the data is pre-multiplied or not (I’d recommend doing so, as it makes a lot of things easier for both you and the CPU).

If your data isn’t pre-multiplied then I’d also change the code above to be


        srcA = alpha == 255 ? srcA : (srcA * (alpha + 1)) >> 8;
        srcR = (srcR * (srcA + 1)) >> 8;
        srcG = (srcG * (srcA + 1)) >> 8;
        srcB = (srcB * (srcA + 1)) >> 8;

ie. make sure to multiply colour values by srcA not just alpha.

Hopefully that’s all correct - I haven’t actually tried it!

If you’re interested in looking at my blending mode code in Praxis, have a look here http://code.google.com/p/praxis/source/browse/ripl/src/net/neilcsmith/ripl/rgbmath/RGBComposite.java. SrcOver is actually called Normal. Praxis as a whole is GPL3, but feel free to use anything in this file, or the RGBMath file (which it also needs).

Hope that helps.

Best wishes, Neil

Dx4 · July 20, 2011, 9:03am

thanks for the reply.

nsigma:

I’ve also written my own software renderer. Just had my first morning coffee so it’s a bit early for bitshifting - let’s hope this is right … ;D
Dx4:
        srcA *= alpha;
        srcR *= alpha;
        srcG *= alpha;
        srcB *= alpha;
This bit looks wrong. Assuming your alpha value is in the range 0…255, then that needs to be shifted down.

This should be right, as I’m doing a >> 8 shift after, to move it back into the range of a byte.

eg.
  srcA = (srcA * alpha) >> 8;
In my code I’m also doing the equivalent of
  srcA = (srcA * (alpha + 1)) >> 8;
which seems to be more accurate.

I will try this, thanks for spotting that.

The second issue may be that Porter-Duff algorithms expect pre-multiplied colour data. You haven’t mentioned whether the data is pre-multiplied or not (I’d recommend doing so, as it makes a lot of things easier for both you and the CPU).

If your data isn’t pre-multiplied then I’d also change the code above to be
        srcA = alpha == 255 ? srcA : (srcA * (alpha + 1)) >> 8;
        srcR = (srcR * (srcA + 1)) >> 8;
        srcG = (srcG * (srcA + 1)) >> 8;
        srcB = (srcB * (srcA + 1)) >> 8;
ie. make sure to multiply colour values by srcA not just alpha.

Hopefully that’s all correct - I haven’t actually tried it!

This looks interesting. I will take a look into it.

If you’re interested in looking at my blending mode code in Praxis, have a look here http://code.google.com/p/praxis/source/browse/ripl/src/net/neilcsmith/ripl/rgbmath/RGBComposite.java. SrcOver is actually called Normal. Praxis as a whole is GPL3, but feel free to use anything in this file, or the RGBMath file (which it also needs).

I will definitely take a look at your Praxis project. It looks quite interesting.

Hope that helps.

Best wishes, Neil

Thanks for your time, Neil

EDIT:

it still results in pixels with less alpha than expected for some reason, when low alpha values are used, try 1

this is the code I used:


    public static final int ALPHA_MASK = 0xff000000;
    public static final int RED_MASK = 0x00ff0000;
    public static final int GREEN_MASK = 0x0000ff00;
    public static final int BLUE_MASK = 0x000000ff;

     public static int premultiply(int argb) {
        int a = argb >>> 24;

        if (a == 0) {
            return 0;
        }
        else if (a == 255) {
            return argb;
        }
        else {
            return (a << 24) | multRGB(argb, a);
        }
    }

	public static int multRGB(int src, int multiplier) {
        multiplier++;
        return ((src & RED_MASK) * multiplier) >> 8 & RED_MASK |
                ((src & GREEN_MASK) * multiplier) >> 8 & GREEN_MASK |
                ((src & BLUE_MASK) * multiplier) >> 8;
    }


	private void blendPixelEXT(int pixels[], int offset, int srcPx, int alpha) {
		int destPx = pixels[offset];
		destPx = premultiply(destPx);
		srcPx = premultiply(srcPx);
		int srcA = (alpha == 255) ? (srcPx & ALPHA_MASK) >>> 24
				: mult((srcPx & ALPHA_MASK) >>> 24, alpha);
		int srcR = (alpha == 255) ? (srcPx & RED_MASK) >>> 16
				: mult((srcPx & RED_MASK) >>> 16, alpha);
		int srcG = (alpha == 255) ? (srcPx & GREEN_MASK) >>> 8
				: mult((srcPx & GREEN_MASK) >>> 8, alpha);
		int srcB = (alpha == 255) ? (srcPx & BLUE_MASK)
				: mult(srcPx & BLUE_MASK, alpha);
		int destA = (destPx & ALPHA_MASK) >>> 24;
		int destR = (destPx & RED_MASK) >>> 16;
		int destG = (destPx & GREEN_MASK) >>> 8;
		int destB = (destPx & BLUE_MASK);

		pixels[offset] = unpremultiply(blend(srcA, destA, srcA) << 24 |
				blend(srcR, destR, srcA) << 16 |
				blend(srcG, destG, srcA) << 8 |
				blend(srcB, destB, srcA));
	}

    public static int blend(int src, int dest, int alpha) {
        return src + (((0xFF - alpha) * dest) >> 8);
    }

    public static int mult(int val, int multiplier) {
        return (val * (multiplier + 1)) >> 8;
    }

I’m thinking of just capping the alpha so the resultant alpha can NEVER be less than the destination alpha… seems like the easiest way out.

nsigma · July 20, 2011, 1:12pm

Dx4:

    public static int blend(int src, int dest, int alpha) {
        return src + (((0xFF - alpha) * dest) >> 8);
    }

    public static int mult(int val, int multiplier) {
        return (val * (multiplier + 1)) >> 8;
    }
I’m thinking of just capping the alpha so the resultant alpha can NEVER be less than the destination alpha… seems like the easiest way out.

This is probably still rounding errors caused by bitshifting. Could try -


    public static int blend(int src, int dest, int alpha) {
        return src + mult(dest, 0xFF - alpha);
    }

    public static int mult(int val, int multiplier) {
        return (val * (multiplier + 1)) >> 8;
    }

ie. force it to add 1 to the multiplier.

Haven’t been able to test it, but I think it might work. Should probably evaluate that fix for my code too.

Best wishes, Neil

Dx4 · July 20, 2011, 2:53pm

nsigma:

Dx4:
    public static int blend(int src, int dest, int alpha) {
        return src + (((0xFF - alpha) * dest) >> 8);
    }

    public static int mult(int val, int multiplier) {
        return (val * (multiplier + 1)) >> 8;
    }
I’m thinking of just capping the alpha so the resultant alpha can NEVER be less than the destination alpha… seems like the easiest way out.
This is probably still rounding errors caused by bitshifting. Could try -
    public static int blend(int src, int dest, int alpha) {
        return src + mult(dest, 0xFF - alpha);
    }

    public static int mult(int val, int multiplier) {
        return (val * (multiplier + 1)) >> 8;
    }
ie. force it to add 1 to the multiplier.

Haven’t been able to test it, but I think it might work. Should probably evaluate that fix for my code too.

Best wishes, Neil

Thanks so much Neil! That did the trick

BTW: Slight optimization:

instead of


int destA = (destPx & ALPHA_MASK) >>> 24;

use


int destA = (destPx) >>> 24;

no need to mask it when you know that the left 24 bits will always be zero, same thing applies to srcA.

nsigma · July 20, 2011, 6:11pm

No problem. Thanks for testing my fix for me! ;D

Dx4:

BTW: Slight optimization:

instead of
int destA = (destPx & ALPHA_MASK) >>> 24;
use
int destA = (destPx) >>> 24;
no need to mask it when you know that the left 24 bits will always be zero, same thing applies to srcA.

Nice catch! Must have been a bit too quick on the copy, paste, replace there.

There’s a few other potential optimisations too - the immediate one that comes to mind is to replace the 3 checks for alpha==255 with a single if statement. In fact, I’d done that in the first Add composite but none of the others. I must go back to this class and do some tidy up - I haven’t looked at it for a while - was just glad it worked at all!

Incidentally, what are you writing a software renderer for? A project of its own or something bigger?

And I know I mentioned it before, but I’d really recommend moving to premultiplied data throughout - it’s easier and better performing - trust me, I found out the hard way! ;D

Dx4 · July 20, 2011, 6:55pm

I’m writing a software renderer to handle the brushes for my project:

http://www.java-gaming.org/index.php/topic,24478.0.html

I wasn’t happy with the performance of Java2D, especially in regards to colorizing images on the fly; therefore I had to devise something much faster.

Now that I have switched to a software renderer, I get the following benefits:

Able to use any blendmode I want for brushes (Add, Multiply, Over, Overlay, Screen, Darken, etc)
Able to save the state of any pixels I modify without doing a scan beforehand (VERY useful for undo states)
Able to colorize brush pixels on the fly using a simple OR operation: (source << 24) | color, previously in Java2D I had to create a new image, get a graphics context, set alphacomposite, set color and call drawLine(x,y,x,y) for each pixel I wanted to recolor. INSANE speed increase in brush rendering once I made the switch.
Uses much less memory, each brush image (with colors, etc) took about 4kb memory before, and about 1000 would be created for a single stroke, equal to 4MB. With my new software renderer, it is able to use a constant 400kb per brush (for 100 sizes)

overall, the software renderer is about 15x faster than Java2D and also supports blendmodes, state saving, etc, its a win-win situation.

I might switch to TYPE_ARGB_PRE later, if required, but I’ve actually tried it before and it results in weird artifacts when I zoom in, i’ll save it for investigation on another day

BTW: Brush class is now:


package as.internal.unnatural;

import as.internal.UndoManager.UndoState;
import java.awt.AlphaComposite;
import java.awt.Color;
import java.awt.Graphics2D;
import java.awt.RenderingHints;
import java.awt.image.BufferedImage;
import java.awt.image.DataBufferInt;
import java.util.Arrays;

/**
 *
 * @author David
 */
public class Brush {

	private int[][] brushData;
	private int maxPixelValue;
	public static int TypeRound = 1,
			TypeSquare = 2;

	public static Brush getSquareBrush(int count) {
		return new Brush(count, TypeSquare, false);
	}

	public static Brush getRoundBrush(int count) {
		return new Brush(count, TypeRound, true);
	}

	private Brush(int numSizes, int type, boolean antialias) {
		brushData = new int[numSizes][];
		for (int i = 1; i <= numSizes; i++) {
			BufferedImage brush = new BufferedImage(i, i, BufferedImage.TYPE_INT_ARGB);
			Graphics2D g = brush.createGraphics();
			if (antialias) {
				g.setRenderingHint(RenderingHints.KEY_INTERPOLATION, RenderingHints.VALUE_INTERPOLATION_BICUBIC);
				g.setRenderingHint(RenderingHints.KEY_ANTIALIASING, RenderingHints.VALUE_ANTIALIAS_ON);
			}
			g.setColor(Color.BLACK);
			g.setComposite(AlphaComposite.Src);
			if (type == TypeRound) {
				g.fillOval(0, 0, i, i);
			} else if (type == TypeSquare) {
				g.fillRect(0, 0, i, i);
			}
			g.dispose();
			int array[] = ((DataBufferInt) brush.getRaster().getDataBuffer()).getData();
			brushData[i - 1] = Arrays.copyOf(array, array.length);
			int[] data = brushData[i - 1];
			for (int j = 0; j < data.length; j++) {
				data[j] >>>= 24;
				if (data[j] > maxPixelValue) {
					maxPixelValue = data[j];
				}
			}
			brush.flush();
		}
	}

	public Brush(BufferedImage image, int numSizes) {
		brushData = new int[numSizes][];
		for (int i = 1; i <= numSizes; i++) {
			BufferedImage brush = new BufferedImage(i, i, BufferedImage.TYPE_INT_ARGB);
			Graphics2D g = brush.createGraphics();
			g.setRenderingHint(RenderingHints.KEY_INTERPOLATION, RenderingHints.VALUE_INTERPOLATION_BICUBIC);
			g.drawImage(image, 0, 0, i, i, null);
			g.dispose();
			int array[] = ((DataBufferInt) brush.getRaster().getDataBuffer()).getData();
			brushData[i - 1] = Arrays.copyOf(array, array.length);
			int[] data = brushData[i - 1];
			for (int j = 0; j < data.length; j++) {
				data[j] >>>= 24;
				if (data[j] > maxPixelValue) {
					maxPixelValue = data[j];
				}
			}
			brush.flush();
		}
	}

	public int getPixelMaxAlpha() {
		return maxPixelValue;
	}

	public int getMaxBrushSize() {
		return brushData.length + 1;
	}

	int clamp(int value) {
		return value > 255 ? 255 : value;
	}

	public static int premultiply(int rgbColor, int alpha) {

		if (alpha <= 0) {
			return 0;
		} else if (alpha >= 255) {
			return 0xff000000 | rgbColor;
		} else {
			int r = (rgbColor >> 16) & 0xff;
			int g = (rgbColor >> 8) & 0xff;
			int b = rgbColor & 0xff;

			r = (alpha * r + 127) / 255;
			g = (alpha * g + 127) / 255;
			b = (alpha * b + 127) / 255;
			return (alpha << 24) | (r << 16) | (g << 8) | b;
		}
	}

	public static int unpremultiply(int preARGBColor) {
		int a = preARGBColor >>> 24;

		if (a == 0) {
			return 0;
		} else if (a == 255) {
			return preARGBColor;
		} else {
			int r = (preARGBColor >> 16) & 0xff;
			int g = (preARGBColor >> 8) & 0xff;
			int b = preARGBColor & 0xff;

			r = 255 * r / a;
			g = 255 * g / a;
			b = 255 * b / a;
			return (a << 24) | (r << 16) | (g << 8) | b;
		}
	}

	void alphaBlend(int[] pixels, int offset, int source, int alpha) {
		int destRGB = pixels[offset];
		int destA = destRGB >>> 24;
		//destRGB = premultiply(destRGB, destA);
		int destR = (destRGB >> 16) & 0xff;
		int destG = (destRGB >> 8) & 0xff;
		int destB = destRGB & 0xff;
		int srcA = source >>> 24;
		//source = premultiply(source, srcA);
		int srcR = (source >> 16) & 0xff;
		int srcG = (source >> 8) & 0xff;
		int srcB = source & 0xff;

		srcA *= alpha;
		srcR *= alpha;
		srcG *= alpha;
		srcB *= alpha;

		int oneMinusSrcA = 0xff - (srcA >> 8);

		destR = (srcR + destR * oneMinusSrcA) >> 8;
		destG = (srcG + destG * oneMinusSrcA) >> 8;
		destB = (srcB + destB * oneMinusSrcA) >> 8;
		destA = (srcA + destA * oneMinusSrcA) >> 8;
		//pixels[offset] = (destA << 24) | (destR << 16) | (destG << 8) | destB;
		pixels[offset] = //unpremultiply
				((destA << 24) | (destR << 16) | (destG << 8) | destB);
	}
	
        /*thanks nsigma*/
	public static final int ALPHA_MASK = 0xff000000;
	public static final int RED_MASK = 0x00ff0000;
	public static final int GREEN_MASK = 0x0000ff00;
	public static final int BLUE_MASK = 0x000000ff;

	public static int premultiply(int argb) {
		int a = argb >>> 24;

		if (a == 0) {
			return 0;
		} else if (a == 255) {
			return argb;
		} else {
			return (a << 24) | multRGB(argb, a);
		}
	}

	public static int premultiplyEXT(int argb, int a) {
		if (a == 0) {
			return 0;
		} else if (a == 255) {
			return argb;
		} else {
			return (a << 24) | multRGB(argb, a);
		}
	}

	public static int multRGB(int src, int multiplier) {
		multiplier++;
		return ((src & RED_MASK) * multiplier) >> 8 & RED_MASK
				| ((src & GREEN_MASK) * multiplier) >> 8 & GREEN_MASK
				| ((src & BLUE_MASK) * multiplier) >> 8;
	}

	private void blendPixelEXT(int pixels[], int offset, int srcPx, int alpha) {
		boolean alpha255 = alpha == 255;
		int destPx = pixels[offset];
		destPx = premultiplyEXT(destPx, (destPx) >>> 24);
		int srcA = srcPx >>> 24;
		srcPx = premultiplyEXT(srcPx, srcA);
		if (alpha255 == false) {
			srcA = mult(srcA, alpha);
		}
		int srcR = (alpha255) ? (srcPx & RED_MASK) >>> 16
				: mult((srcPx & RED_MASK) >>> 16, alpha);
		int srcG = (alpha255) ? (srcPx & GREEN_MASK) >>> 8
				: mult((srcPx & GREEN_MASK) >>> 8, alpha);
		int srcB = (alpha255) ? (srcPx & BLUE_MASK)
				: mult(srcPx & BLUE_MASK, alpha);
		int destA = (destPx) >>> 24;
		int destR = (destPx & RED_MASK) >>> 16;
		int destG = (destPx & GREEN_MASK) >>> 8;
		int destB = (destPx & BLUE_MASK);

		pixels[offset] = unpremultiply(blend(srcA, destA, srcA) << 24
				| blend(srcR, destR, srcA) << 16
				| blend(srcG, destG, srcA) << 8
				| blend(srcB, destB, srcA));
	}

	public static int blend(int src, int dest, int alpha) {
		return src + mult(dest, 0xFF - alpha);
	}

	public static int mult(int val, int multiplier) {
		return (val * (multiplier + 1)) >> 8;
	}

	public void drawBrush(int x, int y, int size, int alpha, int color, int[] pixels, int width, int height, UndoState state) {
		if (alpha == 0) {
			return;
		}
		int brushIndex = 0;
		int[] brushPixels = brushData[size - 1];
		for (int i = 0; i < size; i++) {
			final int ycoord = y + i;
			if (ycoord < 0 || ycoord >= height) {
				continue;
			}
			for (int j = 0; j < size; j++) {
				int source = (brushPixels[brushIndex++]);
				if (source > 0) {
					final int xcoord = x + j;
					if (xcoord < 0 || xcoord >= width) {
						continue;
					}
					final int pos = ycoord * width + xcoord;
					state.putPixel(pixels[pos], xcoord, ycoord);
					blendPixelEXT(pixels, pos, (source << 24) | color, alpha);
				}
			}
		}
	}

	public void drawBrush(int x, int y, int size, int alpha, int color, UserLayer layer) {
		if (size <= 0) {
			return;
		}
		int brushIndex = 0;
		int[] brushPixels = brushData[size - 1];
		int limit = (alpha * maxPixelValue) / 255;
		for (int i = 0; i < size; i++) {
			for (int j = 0; j < size; j++) {
				int source = (brushPixels[brushIndex++]);
				if (source > 0) {
					final int xcoord = x + j, ycoord = y + i;
					if (xcoord < 0 || ycoord < 0 || xcoord >= layer.width || ycoord >= layer.height) {
						continue;
					}
					final int xcoordScaled = xcoord >> 5, ycoordScaled = ycoord >> 5;
					UserLayer.Tile tile = layer.getTileForNoScaling(xcoordScaled, ycoordScaled);
					int offset = ((ycoord - (ycoordScaled << 5)) << 5) + xcoord - (xcoordScaled << 5);
					int destA = tile.pixels[offset] >>> 24;
					destA = source + ((destA * (0xff - source)) >> 8);
					destA = Math.min(limit, destA);
					tile.pixels[offset] = color | (destA << 24);
				}
			}
		}
	}
}

just in case anyone else finds it useful…

David

nsigma · July 20, 2011, 7:09pm

That project looks pretty sweet! Though MySQL seems to be down on your server at the moment.

Your experience on switching from Java2D is similar to mine - huge speed up - lots of things become faster (or even possible!)

Dx4 · July 20, 2011, 7:11pm

it is down. subscription on my dedicated server expired i have yet to renew it

but you can still view the gallery - my artist drew those images in UxPaint: http://uxpaint.com/gallery

This is the most i’ve optimized the algorithm so far:


	private void blendPixelEXT(int pixels[], int offset, int srcPx, int alpha) {
		int destPx = pixels[offset];
		int srcA = srcPx >>> 24;
		int srcR = (srcPx & RED_MASK) >>> 16;
		int srcG = (srcPx & GREEN_MASK) >>> 8;
		int srcB = (srcPx & BLUE_MASK);
		if (alpha != 255) {
			srcR = mult(srcR, alpha);
			srcG = mult(srcG, alpha);
			srcB = mult(srcB, alpha);
			srcA = mult(srcA, alpha);
		}
		int destA = (destPx) >>> 24;
		int destR = (destPx & RED_MASK) >>> 16;
		int destG = (destPx & GREEN_MASK) >>> 8;
		int destB = (destPx & BLUE_MASK);

		pixels[offset] = (blend(srcA, destA, srcA) << 24
				| blend(srcR, destR, srcA) << 16
				| blend(srcG, destG, srcA) << 8
				| blend(srcB, destB, srcA));
	}