Fastest get2fold algorithm

Hia, this is the fastest algorithm which will find the nearest, larger, 2^n number, given an integer. For example: 94 will yield 128. 265 will yield 512. etc.

I got it from here: http://mindprod.com/jgloss/widthinbits.html

The original author is Dirk Bosmans. His original algorithm returned the minimum number of bits required to store a given number. The only change I made is that the algorithm now returns 2^n instead of n.

The commented out algorithm is a slower, but cleaner version.

I will use this algorithm in my next post, the fastest texture-generating algorithm. but I thought this was worth mentioning by itself.

** edit: fixed bug in code. now the “branching method” gives equal results to the java.Math method.


/*public static double invln2 = 1.0 / Math.log(2.0);

public static int get2Fold(int num) {
      return (int) Math.pow(2,Math.ceil(Math.log(num) * invln2));
}
*/

/**
* Calculate how many bits wide a number is,
* i.e. position of highest 1 bit.
* Fully unraveled binary search method.
* @return p where 2**p is first power of two >= n.
* e.g. binary 0001_0101 -> 5, 0xffffffff -> 32,
* 0 -> 0, 1 -> 1, 2 -> 2, 3 -> 2, 4 -> 3
* 
* (Vincent) Changed to return 2^n instead of n
* 
* @author Dirk Bosmans Dirk.Bosmans@tijd.com 
* @author Vincent Vollers*/
static public final int get2Fold( int n )
   {
   if ( n < 0 ) return 32;
   if ( n > 0x00010000 )
      {
      if ( n > 0x01000000 )
         {
         if ( n > 0x10000000 )
            {
            if ( n > 0x40000000 )
               {
               // if ( n > 0x7fffffff )
               // return 32
               // else
              // return 2147483648; is too much, so return -1
                        return -1;
               }
            else
               {
               // !( n > 0x3fffffff )
               if ( n > 0x20000000 ) return 1073741824;
               else return 536870912;
               }
            }
         else
            {
            // !( n > 0x0fffffff )
            if ( n > 0x04000000 )
               {
               if ( n > 0x08000000 ) return 268435456;
               else return 134217728;
               }
            else
               {
               // !( n > 0x03ffffff )
               if ( n > 0x02000000 ) return 67108864;
               else return 33554432;
               }
            }
         }
      else
         {
         // !( n > 0x00ffffff )
         if ( n > 0x00100000 )
            {
            if ( n > 0x00400000 )
               {
               if ( n > 0x00800000 ) return 16777216;
               else return 8388608;
               }
            else
               {
               // !( n > 0x003fffff )
               if ( n > 0x00200000 ) return 4194304;
               else return 2097152;
               }
            }
         else
            {
            // !( n > 0x000fffff )
            if ( n > 0x00040000 )
               {
               if ( n > 0x00080000 ) return 1048576;
               else return 524288;
               }
            else
               {
               // !( n > 0x0003ffff )
               if ( n > 0x00020000 ) return 262144;
               else return 131072;
               }
            }
         }
      }
   else
      {
      // !( n > 0x0000ffff )
      if ( n > 0x00000100 )
         {
         if ( n > 0x00001000 )
            {
            if ( n > 0x00004000 )
               {
               if ( n > 0x00008000 ) return 65536;
               else return 32768;
               }
            else
               {
               // !( n > 0x00003fff )
               if ( n > 0x00002000 ) return 16384;
               else return 8192;
               }
            }
         else
            {
            // !( n > 0x00000fff )
            if ( n > 0x00000400 )
               {
               if ( n > 0x00000800 ) return 4096;
               else return 2048;
               }
            else
               {
               // !( n > 0x000003ff )
               if ( n > 0x00000200 ) return 1024;
               else return 512;
               }
            }
         }
      else
         {
         // !( n > 0x000000ff )
         if ( n > 0x00000010 )
            {
            if ( n > 0x00000040 )
               {
               if ( n > 0x00000080 ) return 256;
               else return 128;
               }
            else
               {
               // !( n > 0x0000003f )
               if ( n > 0x00000020 ) return 64;
               else return 32;
               }
            }
         else
            {
            // !( n > 0x0000000f )
            if ( n > 0x00000004 )
               {
               if ( n > 0x00000008 ) return 16;
               else return 8;
               }
            else
               {
               // !( n > 0x00000003 )
               if ( n > 0x00000002 ) return 4;
               
                     return n;
               }
            }
         }
      }
   } // end widthInBits

Profiling method & results? I’m surprised that a method with that many branches is actually faster on current CPUs.

To be honest, it just said so on the page, I haven’t tested both out for speed…

but I will do so now… on a pentium 4 2.4ghz…

ok, the branched method is faster:


-- Test run 1
Empty loop: 219
Rolled out binary search method: 203
Java.Math() method: 1469
-- Test run 2
Empty loop: 187
Rolled out binary search method: 203
Java.Math() method: 1844
-- Test run 3
Empty loop: 172
Rolled out binary search method: 203
Java.Math() method: 1703
-- Test run 4
Empty loop: 172
Rolled out binary search method: 219
Java.Math() method: 1687
-- Test run 5
Empty loop: 172
Rolled out binary search method: 219
Java.Math() method: 1719

Testing code:


public class TestProgram {
      public static int NUM = 1000000;
      
      public void testRun(int run) {
            System.out.println("-- Test run " + run);
            Random r = new Random();
            long time = System.currentTimeMillis();
            
            for(int i=0;i<NUM;i++) {
                  int n = r.nextInt(5000);
                  int result = n;
            }
            time = System.currentTimeMillis()-time;
            System.out.println("Empty loop: " + time);
            time = System.currentTimeMillis();
            
            for(int i=0;i<NUM;i++) {
                  int n = r.nextInt(5000);
                  int result = TextureTool.get2Fold(n);
            }
            time = System.currentTimeMillis()-time;
            System.out.println("Rolled out binary search method: " + time);
            time = System.currentTimeMillis();
            
            for(int i=0;i<NUM;i++) {
                  int n = r.nextInt(5000);
                  int result = TextureTool.get2Fold2(n);
            }
            time = System.currentTimeMillis()-time;
            System.out.println("Java.Math() method: " + time);
      }
      
      public TestProgram() {
            for(int i=0;i<5;i++) {
                  testRun(i+1);
            }
      }
      public static void main(String[] args) {
            new TestProgram();
      }
}

If you really want to squeeze out every drop of performance, start with the small numbers, as they are the common range (for textures)

Doesn’t Math implementation vary from platform to platform depending whats available on the processor? The Math version might be quicker on some platforms.

Kev

[quote]which will find the nearest, larger, 2^n number, given an integer.
[/quote]
When passing 1024 as argument, it returns 2048.

why always larger ??

actually that wasn’t intentional, I just said “larger” because just saying “nearest” would be incorrect, since 516 would result in 512 (since 512 is nearer).

but it should output 1024. mhhh… thank you for finding this bug. I’ll update the function (will post it here later) and notify the author.

[quote]Doesn’t Math implementation vary from platform to platform depending whats available on the processor? The Math version might be quicker on some platforms.

Kev
[/quote]
Possibly true, I dont have access to other platforms, but maybe someone is willing to test this…

I’ve updated the code, the following program tests the output of both functions, and it’s now always the same.


public class TestProgram {
      public void test2Fold(int i) {
            if(TextureTool.get2Fold2(i) != TextureTool.get2Fold(i)) {
                  System.out.println("i: " + Integer.toString(i , 16) + " m1: " + TextureTool.get2Fold(i) + " m2: " + TextureTool.get2Fold2(i));
            }
      }
      
      public void testRun() {
            System.out.println("-- Testing consistency");
            for(int i=1;i<30;i++) {
                  int n = (int) Math.pow(2, i);
                  
                  test2Fold(n-1);
                  test2Fold(n);
                  test2Fold(n+1);
            }
            System.out.println("-- Done");
      }
      
      public TestProgram() {
            testRun();
      }
      public static void main(String[] args) {
            new TestProgram();
      }
}

I know what my mistake was. The original algorithm was to calculate the number of bits neccesary to contain a given number.

so, for example

seven:

111b
1+2+4 = 7
result: 3. 2^3 = 8, and this is, for our texture mapping, a good value.

but when we take “eight”

0001b
0+0+0+8 = 8
result: 4. 2^4 = 16, the “wrong” answer, since 8 is already aligned, but it is the right “minimum amount of bits to store the number eight”.

In my test cases, I was working with images size not exactly 2^n, so I never noticed.

the new algorithm, as updated above, works fine though.

Offtopic:

We write binary in the same order as in the decimal-system:

7 = 0111b
8 = 1000b

:wink:

Sorry to repeat this code block here. I added a couple other methods that use bit shifting. All results are certified accurate up to 0x4000000.

More performance is possible by doing a bottom-up search for a power of two instead of the top-down that I used. It might even prove much faster if the starting point was chosen using a few if clauses, As it is, I start with the max power of two and shift down until I am lower than the target number.

public class PowerOfTwo {


	public static final double invln2 = 1.0 / Math.log(2.0);

	public static int get2FoldLog(int num) {
		return (int) Math.pow(2, Math.ceil(Math.log(num) * invln2));
	}

	/**
	 * Calculate how many bits wide a number is,
	 * i.e. position of highest 1 bit.
	 * Fully unraveled binary search method.
	 *
	 * @return p where 2**p is first power of two >= n.
	 *         e.g. binary 0001_0101 -> 5, 0xffffffff -> 32,
	 *         0 -> 0, 1 -> 1, 2 -> 2, 3 -> 2, 4 -> 3
	 *         <p/>
	 *         (Vincent) Changed to return 2^n instead of n
	 * @author Dirk Bosmans Dirk.Bosmans@tijd.com
	 * @author Vincent Vollers
	 */
	public static final int get2FoldUnrolled(int n) {
		if (n < 0) return 32;
		if (n > 0x00010000) {
			if (n > 0x01000000) {
				if (n > 0x10000000) {
					if (n > 0x40000000) {
						// if ( n > 0x7fffffff )
						// return 32
						// else
						// return 2147483648; is too much, so return -1
						return -1;
					} else {
						// !( n > 0x3fffffff )
						if (n > 0x20000000) return 1073741824;
						else return 536870912;
					}
				} else {
					// !( n > 0x0fffffff )
					if (n > 0x04000000) {
						if (n > 0x08000000) return 268435456;
						else return 134217728;
					} else {
						// !( n > 0x03ffffff )
						if (n > 0x02000000) return 67108864;
						else return 33554432;
					}
				}
			} else {
				// !( n > 0x00ffffff )
				if (n > 0x00100000) {
					if (n > 0x00400000) {
						if (n > 0x00800000) return 16777216;
						else return 8388608;
					} else {
						// !( n > 0x003fffff )
						if (n > 0x00200000) return 4194304;
						else return 2097152;
					}
				} else {
					// !( n > 0x000fffff )
					if (n > 0x00040000) {
						if (n > 0x00080000) return 1048576;
						else return 524288;
					} else {
						// !( n > 0x0003ffff )
						if (n > 0x00020000) return 262144;
						else return 131072;
					}
				}
			}
		} else {
			// !( n > 0x0000ffff )
			if (n > 0x00000100) {
				if (n > 0x00001000) {
					if (n > 0x00004000) {
						if (n > 0x00008000) return 65536;
						else return 32768;
					} else {
						// !( n > 0x00003fff )
						if (n > 0x00002000) return 16384;
						else return 8192;
					}
				} else {
					// !( n > 0x00000fff )
					if (n > 0x00000400) {
						if (n > 0x00000800) return 4096;
						else return 2048;
					} else {
						// !( n > 0x000003ff )
						if (n > 0x00000200) return 1024;
						else return 512;
					}
				}
			} else {
				// !( n > 0x000000ff )
				if (n > 0x00000010) {
					if (n > 0x00000040) {
						if (n > 0x00000080) return 256;
						else return 128;
					} else {
						// !( n > 0x0000003f )
						if (n > 0x00000020) return 64;
						else return 32;
					}
				} else {
					// !( n > 0x0000000f )
					if (n > 0x00000004) {
						if (n > 0x00000008) return 16;
						else return 8;
					} else {
						// !( n > 0x00000003 )
						if (n > 0x00000002) return 4;

						return n;
					}
				}
			}
		}
	} // end widthInBits


	public static final int get2FoldShifty(int n) {
		if (n == 0) return 0;
		int power = 0;
		n--;
		while (n > 0) {
			n = n >> 1;
			power++;
		}
		return 1 << power;
	}

	public static final int get2FoldLogical(int n) {
		int maxPower = 0x40000000;
		while (maxPower > 0) {
			if (n == maxPower) return maxPower;
			if (n > maxPower) return maxPower << 1;
			maxPower = maxPower >> 1;
		}
		return 0;
	}


	public void testPerformance(int run) {
		System.out.println("-- Test run " + run);

		long time = System.currentTimeMillis();
		for (int i = 0; i < NUM; i++) get2FoldLog(i);
		time = System.currentTimeMillis() - time;

		System.out.println("Math.log method: " + time);

		time = System.currentTimeMillis();
		for (int i = 0; i < NUM; i++) get2FoldUnrolled(i);
		time = System.currentTimeMillis() - time;

		System.out.println("Rolled out binary search method: " + time);

		time = System.currentTimeMillis();
		for (int i = 0; i < NUM; i++) get2FoldShifty(i);
		time = System.currentTimeMillis() - time;

		System.out.println("Shifty method: " + time);

		time = System.currentTimeMillis();
		for (int i = 0; i < NUM; i++) get2FoldLogical(i);
		time = System.currentTimeMillis() - time;

		System.out.println("Logical method: " + time);
	}

	public void testAccuracy() {
		System.out.println("-- Accuracy run ");

		for (int i = 0; i < 0x4000000; i++) {
			int i1 = get2FoldLog(i);
			int i2 = get2FoldUnrolled(i);
			int i3 = get2FoldShifty(i);
			int i4 = get2FoldLogical(i);
			if (i1 != i2 || i2 != i3 || i3 != i4) {
				System.err.println("Error in results: " + i +
						" Log:" + i1 + ", Unrolled: " + i2 +
						", Shifty: " + i3 + ", Logical: " + i4);
				System.err.println("Stopping test, there may be other inaccuracies.");
				return;
			}
		}
		System.out.println("All methods return the exact same results.");
	}

	public PowerOfTwo() {
		for (int i = 0; i < 5; i++) {
			testPerformance(i + 1);
		}
		testAccuracy();
	}

	public static int NUM = 1000000;

	public static void main(String[] args) {
		new PowerOfTwo();
	}
}

Interesting results. Here are my results of your (slightly tweaked) benchmark: (numbers in seconds, lower is better)


Total Math.log method time: 35.26507531
Total Rolled out BS method time: 0.453305379
Total Shifty method time: 1.676443794
Total Logical method time: 0.957726191

(basically increased NUM to 5000000, changed System.currentTimeMillis() to System.nanoTime(), formatted the output to be in seconds and added all times from all runs of which you see the result here)

fastest algorithm?!

I do something like the following to find nereast power of two

function findpow2(int val)
{
int result=1;
while(result<val) result<<=1;
return result;
}

Which is much much slower
because: which is much much slower

:slight_smile:

I trust you, I did not benchmark anything for now,

also, it is so simple that it can be inlined

slightly cleanedup version of the code:

   public static final int get2FoldUnrolled(int n)
   {
      if (n < 0)
         return 0xFFFFFFFF;

      if (n > 0x00010000)
      {
         if (n > 0x01000000)
         {
            if (n > 0x10000000)
            {
               if (n > 0x40000000)
                  return 0xFFFFFFFF;
               return (n > 0x20000000) ? 0x40000000 : 0x20000000;
            }

            if (n > 0x04000000)
               return (n > 0x08000000) ? 0x10000000 : 0x08000000;
            return (n > 0x02000000) ? 0x04000000 : 0x02000000;
         }

         if (n > 0x00100000)
         {
            if (n > 0x00400000)
               return (n > 0x00800000) ? 0x01000000 : 0x00800000;
            return (n > 0x00200000) ? 0x00400000 : 0x00200000;
         }

         if (n > 0x00040000)
            return (n > 0x00080000) ? 0x00100000 : 0x00080000;
         return (n > 0x00020000) ? 0x00040000 : 0x00020000;
      }

      if (n > 0x00000100)
      {
         if (n > 0x00001000)
         {
            if (n > 0x00004000)
               return (n > 0x00008000) ? 0x00010000 : 0x00008000;
            return (n > 0x00002000) ? 0x00004000 : 0x00002000;
         }

         if (n > 0x00000400)
            return (n > 0x00000800) ? 0x00001000 : 0x00000800;
         return (n > 0x00000200) ? 0x00000400 : 0x00000200;
      }

      if (n > 0x00000010)
      {
         if (n > 0x00000040)
            return (n > 0x00000080) ? 0x00000100 : 0x00000080;
         return (n > 0x00000020) ? 0x00000040 : 0x00000020;
      }

      if (n > 0x00000004)
         return (n > 0x00000008) ? 0x00000010 : 0x00000008;
      return (n > 0x00000002) ? 0x00000004 : n;
   }

Who will turn this code into a massive (()?a: (?b:c))?d:e block? ;D

Interesting that you say this is the fastest algorithm, which it is not. While it may be in the general case when one uses it in a certain context knowing the upper and lower bounds then a lookup table will be the fatest method (3-4x faster than BS) :slight_smile:

i.e.

static final int[] lookupTable;
static {
	lookupTable = new int[MAX_LOOKUP];
	for(int i = 0; i < MAX_LOOKUP; i++)
		lookupTable[i] = get2FoldUnrolled(i);
}

public final static int get2FoldLookup(int val) {
	return lookupTable[val];
}

Your application will probably hit a certain input range 90% of the time, in which case you use the lookup table method. For all other input values you use the BS method.

Its always a space/time tradeoff.

hum your idea of lookup table make me think about a solution that will be faster than all other, I did not try to implement it yet but it should work and also use very low memory for lookup.

first create a lookup table with 256 entries

our lookup table will give near power of values ranging from 0 to 255
basically
lookup[0]=1
lookup[1]=1
lookup[2]=2
lookup[3]=4
lookup[4]=4
lookup[5]=8
lookup[6]=8
lookup[7]=8
lookup[8]=8
etc…

than use a simple scale on your input&output value like that
function(val)
{
if(val>0 && val <256)
scale=0;
else
if(val>=256 && val <65536)
scale=8;
else
if(val>=65536 && val < 16777216 )
scale=16;
else
if(val>=16777216 && val<=2147483647)
scale=24;
return lookup[val>>scale]<<scale;
}

or something like that

you can use smaller range to use less memory : like 4 bits rather than height for this sample
or use higher range to get better performance like 12 bits

if someone have some spare time to test i will be interrested to know bench result on this function

edit: also “if” must be rearanged to get faster

if(val>65536)
{if(val<256)
{scale=8}
else
{scale=16}
}else[…etc}

EDIT2:

something again better

function(val)
{
if(val>0 && val <256)
return lookup[val];
else
if(val>=256 && val <65536)
return lookup[val>>8]<<8;
else
if(val>=65536 && val < 16777216 )
return lookup[val>>16]<<16;
else
if(val>=16777216 && val<=2147483647)
return lookup[val>>24]<<24;
return -1
}

Looking again to my previous post it seems to me that I made stupid error, here is a new version:

function(val)
{
if(val<0)
return -1;
else
if(val <256)
return lookup[val];
else
if(val <65536)
return lookup[val>>8]<<8;
else
if(val < 16777216 )
return lookup[val>>16]<<16;
else
if(val<2147483647)
return lookup[val>>24]<<24;
return -1
}