Fast color manipulation

Archive · July 4, 2016, 6:13am

First off, sorry for the amount of threads i’ve been posting lately.

Anyways, the issue is this:
I have a method


	private static int manipulate(int t, int[] c) {
		int r = (int) ((t >> 16) & 0xFF) * (c[0] / 255.0f);
		int g = (int) ((t >> 8) & 0xFF) * (c[1] / 255.0f);
		int b = (int) ((t & 0xFF)) * (c[2] / 255.0f);
		return ((r << 16) | (g << 8) | b);
	}

This method does the following, it takes in a RGB packed color ‘t’ and manipulates its brightness.
The integer array ‘c’ contains three integers ranging from 0-255 and these represent the brightness for each color channel (R, G, B)
The integers in ‘c’ get converted to their fractional values by dividing by 255.0f and then get multiplied to each color channel.
This creates a color that is either darkened, left alone, or set to black.

My problem is that this method gets called per-pixel (and is actually called even more often since the z-buffer creates overdraw) and I believe that it can be optimized.

Bit manipulation to this caliber are not something I am good with, but the general solution that I want to look towards is keeping the integers in ‘c’ as integers and somehow, using bit manipulation, adjusting the channels’ brightness based on a value from 0-255 instead of a fraction.

Thank you

HeroesGraveDev · July 4, 2016, 6:40am

First thing to do is get rid of the floating point math.


int r = (t >> 16 & 0xFF) * c[0] / 255;
int g = (t >> 8 & 0xFF) * c[1] / 255;
int b = (t & 0xFF) * c[2] / 255;

If the images are sufficiently large enough (I have no idea how large ‘large’ is and it could be too high for practical purposes), you could split the image up for a number of threads to do the work concurrently (make sure each thread has a contiguous section of the image’s backing array to avoid cache issues).

Other options would be using JNI to run the algorithm in native code, where you can take advantage of things like SIMD.

CoDi_R · July 4, 2016, 7:53am

No need to use floating point arithmetic.


	private static int manipulate(int t, int[] c) {
		int r = ((t >> 16) & 0xFF) * c[0] / 255;
		int g = ((t >> 8) & 0xFF) * c[1] / 255;
		int b = (t & 0xFF) * c[2] / 255;
		return ((r << 16) | (g << 8) | b);
	}

If you can live with a slight “error” your brightness calculation, you can avoid the division:


	private static int manipulate(int t, int[] c) {
		int r = ((t >> 16) & 0xFF) * c[0];
		int g = ((t >> 8) & 0xFF) * c[1];
		int b = (t & 0xFF) * c[2];
		return ((r & 0xFF00) << 8) | (g & 0xFF00) | ((b & 0xFF00) >> 8);
	}

This is the part I do not understand. Are you doing software rendering?

edit: oh I see, just checked your other threads.

abcdef · July 4, 2016, 10:13am

That last optimisation won’t work because you will either get 0, the same value, or potentially a number bigger than 255.

A potential fast way would be to not read an int value but to read the byte values directly. That way you don’t need to bit shift

Can you maybe have type brightness go to 256 then just bit shift the division instead of dividing by 255?

I suspect the Java compiler will do quite a bit of optimisation behind the scenes

EgonOlsen · July 4, 2016, 11:17am

You could convert the values in c[] to the fixed-point variant of the reciprocal of c once. Like so:


for (int i = 0; i < c.length; i++)
{
    c[i] = (int) ((c[i] / 255f) * 65536);
}

And then use fixed-point math in the method:


private static int manipulate2(int t, int[] c)
{
    int r = (((t >> 16) & 0xff) * c[0]) & 0xff0000;
    int g = ((((t >> 8) & 0xff) * c[1]) >> 8) & 0xff00;
    int b = (((t & 0xff) * c[2]) >> 16) & 0xff;
    return r | g | b;
}

Archive · July 4, 2016, 3:36pm

Thanks for the replies,

CoDi^R, I just thought of this one, is this the same as what you posted?


	private static int manipulate(int t, int[] c) {
		int r = (((t >> 16) & 0xFF) * c[0]) >> 8;
		int g = (((t >> 8) & 0xFF) * c[1]) >> 8;
		int b = (((t & 0xFF)) * c[2]) >> 8;
		return ((r << 16) | (g << 8) | b);
	}

EgonOlsen, I think that those require more calculations than we started with since ‘c’ is not an array that is constant, it’s different for every pixel

EgonOlsen · July 4, 2016, 3:45pm

You could do it in fixed point math anyway, you ‘just’ have to modify all your other code as well…

Archive · July 4, 2016, 3:47pm

Ahah yeah

CoDi_R · July 4, 2016, 6:39pm

Ah yes, it is. I just merged the “divisions by 256” into last line.

HeroesGraveDev · July 4, 2016, 8:13pm

Please keep in mind that x/255 is not the same as x >> 8, and that the latter will not give correct results all the time (in fact, it’s impossible to get the maximum value of 255).

Archive · July 4, 2016, 8:14pm

How bad could the results be? The graphics still look fine to me

basil · July 4, 2016, 9:23pm

just a tiny glitch bit

HeroesGraveDev · July 4, 2016, 9:31pm

Each time you pass the image data through the operation, each component will decrease by 1. So if you pass it through 255 times, the entire image will be black.

As long as you’re not storing the output anywhere and reusing it, you are probably fine (your colour range will be reduced to 0-254).

abcdef · July 5, 2016, 4:51am

what is the value range for c[i]? without any clamping you are liable to get strange results. For example, making a blue pixel become green. Your int r, int g, int b should really be clamped to the 0-255 range.

CoDi_R · July 5, 2016, 7:35am

Brightness range 0…255 vs 0…256: No big deal in this case, I suppose. But it gave me an idea:

Do you really need separate “brightness” factors for R, G and B? Is this actually used as “brightness” calculation, or for per-pixel diffuse lighting?
Do you have a check for “fullbright”? Would probably be a waste if this is done per pixel, but if this is a fullscreen effect, you can skip the whole pass.

Clamping: As part of a render pipeline, I’d expect RGB being in range already. This function would be the wrong place for clamping.

Archive · July 5, 2016, 3:03pm

by the time that ‘c’ has gotten to this function, it has been clamped to 0-255 by the renderer.

@ CoDi^R :
This calculation takes in the interpolated vertex brightness (which takes into account ambient, diffuse, specular, and emissive lighting)

ddyer · July 5, 2016, 5:52pm

use a pre-computed lookup table with 256 elements, or if you really need them, 3 lookup tables.
Bonus; this gives you arbitrary color remapping.

J01 · July 5, 2016, 7:11pm

Maybe you could achieve similar results with a transparent grey layer rendered above your initial components? That way it would probably be a lot less expensive.

theagentd · July 5, 2016, 10:21pm

LUTs are surprisingly overlooked in many valid cases.

Hydroque · July 6, 2016, 4:06am

Archive:

	private static int manipulate(int t, int[] c) {
		int r = (((t >> 16) & 0xFF) * c[0]) >> 8;
		int g = (((t >> 8) & 0xFF) * c[1]) >> 8;
		int b = (((t & 0xFF)) * c[2]) >> 8;
		return ((r << 16) | (g << 8) | b);
	}

Archive, I can see your goal was to strip out the divisor and float. Not a bad approach, but floats aren’t much but ints. Java uses IEEE 754 floats which are 4 bytes itself. Since an int is already to be allocated by shifting and crap, we are fine with using floats. The divisor is always the most straining part. We need to focus on taking that out and not on taking floats out of question. We need floats for that normalization, too. 0.73 isn’t possible inside an int.

OP, It seems you are storing in either ARGB or don’t even have A. You never stated. Also, you only need to mask bits when you are compiling them back down to one int. You will get some funky numbers if you don’t mask your numbers before ORing. Note that I am doing with >>>. Use it. Also, note java works from left to right and the precedence of the operators. Therefore the only thing you need in parenthesis in the return are the ANDs. Check out precedence if ur interested. It makes for a more legible code when you don’t be like modern languages with too many parenthesis and can be found here It doesn’t matter if you know it or not, though. The way Archive did it, he was just showing a more logical side of it… or maybe not. Still doesn’t matter.

	private static int manipulate(int t, int[] c) {
		int r = t >>> 16;
		int g = t >>> 8;
		int b = t >>> 0; // this does nothing. >>> 0  should be excluded.
		return (r & 0xFF) << 16 | (g & 0xFF) << 8 | (b & 0xFF) << 0; // this does nothing. >>> 0 should be excluded.
	}

The next step is to figure out how we are going to go about replacing the division sequence for rgb. You were suggested other means of doing this manipulation. I am sorry to say there is no other optimization you can do with normalizing to 0-1 to multiply. Maybe the >>> 8 and 256 has the answer you are seeking.

	private static int manipulate(int t, int[] c) {
		int r = t >>> 16 * (c[0] / 255f);
		int g = t >>> 8 * (c[1] / 255f);
		int b = t  * (c[1] / 255f);
		return (r & 0xFF) << 16 | (g & 0xFF) << 8 | (b & 0xFF);
	}

Just make sure a side of * is a float. The resulting type of (int / float) is float. (float / int) is float. Any result from a bitshift is an int, so you have to cast from int to float in that case, but this isn’t the case. So to summerize the int r… Right hand of the * is a float because (int / float). The left hand side is an int. (int * float) is float. int r; determines the trailing decimals will be cut off and it is now int.