I was trying to speedup some boundschecks on data that had a POT dimension.
So I needed to clamp integers between 0…63, 0…127, 0…255, 0…511, etc.
clamp(val, 0, 511) clampPOT(val, 9)
public static final int clampPOT(int val, int bits)
{
// every negative value will become 0
val = ((val >> 31) | (val - 1)) + 1;
// every value >= (POT-1) will become (POT-1)
return (((0 - (val & (-1 << bits))) >> 31) | val) & (~(-1 << bits));
}
public static final int clamp(int val, int min, int max)
{
return val < min ? min : (val > max ? max : val);
}
I had this assumption that branching was kinda slow, and fiddling with bits was ultra super fast. I was going to proudly present my bits fiddler on here… Turns out the non-branching code is slower on my Intel Code2Quad.
Benchmark:
clamp(val, 0, 511) ==> 19ms clampPOT(val, 9) ==> 28ms
It’s not a benchmark-flaw, as my code gets slower in the real algorithm too.
Let this be a lesson for… all of us :