storing limited range floating point numbers efficiently?

I want to store a set of floating points which have a maximum of 8 digits, sign and decimal point.

i.e
values between -99999999.0 and 99999999.0

e.g.

these would be valid
-78.2234
6733.2225

these would be invalid and truncated

224155.33366 tunc to 224155.33
-52.436627882 tunc to -52.436629

My current thought will only save a few bytes:

write sign (1 bit)
write decimal point position (3bit)
write digits (variable: 24 - 32 bits) digits 0-6 = 3 bits, digits 7-9 = 4 bits

so a number would be between 28 and 36 bits which is very similar to that of a normal float…

I might be able to get away with six digit values, however that only brings the bits needed to between 22 and 28bits which is not a large saving.

Anyone have any idea of a better way to store such values?

That are 1799999991 unique values. (99999999*2+1)*9, as the decimal can be at 9 places…

This is like 31 bits, so I’d just use ints.

int encoded = …;
double decoded = (val/9 - 99999999) / Math.pow(10, val%9)

or something like that… oh, and you should cache the results of Math.pow in a double[9] ofcourse.

I like your thinking.

If it turns out i can use fewer digits then your idea should give some savings… i.e.

for 6 digits the range is from -999999.0 to 999999.0 which should give 13999993 values and able to be put in 24bits

I am a little lost at how you are proposing to convert the input floating point to an integer represenation? I am trying to reverse engineer it from your decoding formula but with little avail…

I just have to say, elegant solution, Riven! :smiley:

Using some Math™, I figured out there was space for an exponent of up to 21, still fitting it all in 32 bits, allowing for silly small numbers and silly high numbers.
Here’s a quick implementation: (the exponent is offset by 7 to allow for negative exponents)

	public static final int EXPONENT_OFFSET = 7;
	
	private static int doubleToInt(double in)
	{
		if (in==0) return 0;
		if (in==Double.POSITIVE_INFINITY) return Integer.MAX_VALUE;
		if (in==Double.NEGATIVE_INFINITY) return Integer.MIN_VALUE;

		int sign=1;
		if (in<0)
		{
			sign = -1;
			in = -in;
		}
		
		// Horrible exponent counting ahead, at O(log n)
		int exponent = EXPONENT_OFFSET;
		for (;in<1.0; in*=10) exponent--;
		for (;in>=10.0; in/=10) exponent++;
		
		if (exponent<0) return 0; // Number is too close to zero, return 0.
		if (exponent>=21) return sign==1?Integer.MAX_VALUE:Integer.MIN_VALUE; // Number is too large, pretend it's infinitely large.
		
		int digits = (int)(in*10000000);
		int out = digits*21+exponent;
		return out*sign;
	}

	private static double intToDouble(int in)
	{
		if (in==Integer.MAX_VALUE) return Double.POSITIVE_INFINITY;
		if (in==Integer.MIN_VALUE) return Double.NEGATIVE_INFINITY;
		if (in==0) return 0;

		int sign=1;
		if (in<0)
		{
			sign = -1;
			in = -in;
		}
		
		int exponent = in%21;
		int digits = in/21;
		double value = digits/10000000.0;
		
		return value*Math.pow(10, exponent-EXPONENT_OFFSET)*sign;
	}

Test code:

	public static void test(double in)
	{
		int encoded = doubleToInt(in);
		double out = intToDouble(encoded);
		System.out.println(in + " -> " + out);
	}

	public static void main(String[] args)
	{
		test(1);
		test(0);
		
		test(-99999999.0);
		test(99999999.0);

		test(-0.00000099999999);
		test(0.00000099999999);
		test(-99999999000000.0);
		test(99999999000000.0);
	}

nice work Markus!

While driving home i had thought more on how to represent a floating point number as an integer and i have come up with this: (untested yet as i need to get some sleep :stuck_out_tongue: )

for simplicity to demonstate the algorith i am going to encode a floating point number which can consist of 4 digits + sign + decimal point…

to encode:

float input =33.12F;

int count=4;

int temp=Math.abs((int) input);
while (temp>0)
{
temp>>>1;
count–;
}

int output=((int) (inputcount)+9999)+(99992+1)*count;

to decode:

int input = previous output

int count = input/((99992+1);
int digits = input%(9999
2+1);

float output = ((float) (digits-9999))/ Math.pow(10,count);

You might want to read this. That’s how pretty much everyone stores floating point numbers these days.

I’ll explain a bit:

there are N possible values, with 9 decimal positions, so there are N*9 combinations.
there are M positive values in N, where M = (N-1)/2
when we have a value N and we multiply it by 9, we have the result R


double decoded = (R/9 - M) / Math.pow(10, R%9)

now the question is, how to encode a value into R…

my first inclination is:


double value = 567.4321;
   int decimals = (9-1)-(int)Math.log10(value); // 8-3 = 5 decimal places even if we use only 4
   int asDigitsInM = (int)Math.round(value*Math.pow(10, decimals)); // 56743210 out of M
   int asDigitsInN = asDigitsInM + M; // 56743210 + M
int encoded = asDigitsInN * 9 + decimals; // (56743210 + M) * 9 + {0..8}

// in short:

double value = 567.4321;
   int d = (9-1)-(int)Math.log10(value);
int encoded = ((int)Math.round(value*Math.pow(10, d)) + M) * 9 + d;

if you have 3 decimals, and you [i]decode[i] it, you will divide it by pow(10, 3) = 1000, which makes sense, so I guess it’s reverse engineered…? :slight_smile:

anyway, this is all just writing it without checking, with LOTs of room to optimize, so give it a try!

You always gotta steal the show, don’t you? :slight_smile:

what about ?

int Float.floatToRawIntBits(float value);
float Float.intBitsToFloat(int bits);

EDIT:
having a look in the Float class JavaDoc you can easily pick the Sign, Mantissa and exponent

[quote]Bit 31 (the bit that is selected by the mask 0x80000000) represents the sign of the floating-point number.
Bits 30-23 (the bits that are selected by the mask 0x7f800000) represent the exponent.
Bits 22-0 (the bits that are selected by the mask 0x007fffff) represent the significand (sometimes called the mantissa) of the floating-point number.
If the argument is positive infinity, the result is 0x7f800000.
If the argument is negative infinity, the result is 0xff800000.
[/quote]

float doesn’t have 8 significant/guaranteed digits.

Turned out there were quite a few corner cases:


   public static final int encodeSlow(double d)
   {
      int e = 8 - (int) Math.max(0, Math.ceil(Math.log10(Math.abs(d))));
      return (int) ((Math.floor(d * Math.pow(10, e)) + 99999999L) * 9 + e);
   }

   public static final double decodeSlow(int i)
   {
      return (i / 9 - 99999999) / Math.pow(10, i % 9);
   }

A version that is 18.5x (!!) faster:


   private static double[] pows = new double[] { 1, 10, 100, 1000, 10000, 100000, 1000000, 10000000, 100000000 };

   public static final int encodeFast(double d)
   {
      int e = 0;
      double pow = 0.1;
      double abs_d = Math.abs(d);
      while (abs_d >= (pow *= 10.0) && e++ < 8);
      e = Math.max(0, 8 - e);
      return (int) ((Math.floor(d * pows[e]) + 99999999L) * 9 + e);
   }

   public static final double decodeFast(int i)
   {
      return (i / 9 - 99999999) / pows[i % 9];
   }

I pretty much tested all possible values.

[quote]float doesn’t have 8 significant/guaranteed digits
[/quote]
yes that’s right but that’s dont seems to be the goal:

[quote]I like your thinking.

If it turns out i can use fewer digits then your idea should give some savings… i.e.

for 6 digits the range is from -999999.0 to 999999.0 which should give 13999993 values and able to be put in 24bits
[/quote]
I mean storing a float with less bits knowing Sign Mantissa and exponent is quite simple
let’s say
M=M>>7; //M 23 to 15 bit
E=E>>4; //E 8 to 4 bits
S=S; // 1 to 1 bits

or any other combination…

12 bits saved in the previous sample, ofcourse reducing precision:
S 1 bits
E 4 bits
M 15 bits

and than a new float format that fit in 20bits, just an idea…

@moogie: I may have missunderstood :-, do you want to store Fixed Decimal or real Float ?

float ususally use Base 2 not 10 : F = SM2^E not SM10^E

Lol! i should have looked at this thread during the day!

I have made an algorithm based on your original post and does seem to be quite similar



public class IntFloat {

	static public void main(String[] arg)
	{
		Random random=new Random(-1);
		
		int j=5;
		for (int i=0;i<50;i++)
		{
			float val=(float) (random.nextDouble()*(maxNum[j]*2)-maxNum[j])/10000;
			
			for (int k=1;k<9;k++)
			{
				System.out.println("digits: "+k+ " bits needed: "+maskBitCount[k]+" rounding: false input: "+val +" -> "+decode(encode(val,k,false),k));
				System.out.println("digits: "+k+ " bits needed: "+maskBitCount[k]+" rounding: true input: "+val +" -> "+decode(encode(val,k,true),k));
			}
			
		}
	}
	
	static final private int[] multiplier = {1,10,100,1000,10000,100000,1000000,10000000,100000000};
	static final private int[] maxNum = {1,9,99,999,9999,99999,999999,9999999,99999999,999999999};
	static final private int[] maskBitCount = {0,6,10,13,17,21,24,28,31};
	static final private int[] mask={0,63,1023,8191,131071,2097151,16777215,268435455,2147483647};

	static public int encode(float val,int digitsCount,boolean round)
	{
		int max=maxNum[digitsCount];
		int count=digitsCount;

		// test for outliers
		if (val>max) return max*2;
		if (val<-max) val=-max;

		
		// get the integer component of the input value
		int temp=Math.abs((int) val);
		
		// count the decimal places
		while (temp>0)
		{
			temp/=10;
			count--;
		}
		
		if (count<0)
			count=0;

		if (round ) return (Math.round(val*multiplier[count])+max)+(max*2+1)*count;
		return ((int) (val*multiplier[count])+max)+(max*2+1)*count;
	}
	
	static public float decode(int val,int digitsCount)
	{
		// get the maximum number able to be represented by the number of digits
		final int max=maxNum[digitsCount];
		
		// calculate the total possible number of numbers able to represented
		final int temp=(max*2+1);

		// extract the number of decimal places
		int count = val/temp;
		
		// extract the digits
		int digits = val%temp;

		return ((float)(digits-max))/multiplier[count];
	}


with the results as follows:


digits: 1 rounding: false input: -4.6211014 -> -4.0 bits needed: 6
digits: 1 rounding: true input: -4.6211014 -> -5.0 bits needed: 6
digits: 2 rounding: false input: -4.6211014 -> -4.6 bits needed: 10
digits: 2 rounding: true input: -4.6211014 -> -4.6 bits needed: 10
digits: 3 rounding: false input: -4.6211014 -> -4.62 bits needed: 13
digits: 3 rounding: true input: -4.6211014 -> -4.62 bits needed: 13
digits: 4 rounding: false input: -4.6211014 -> -4.621 bits needed: 17
digits: 4 rounding: true input: -4.6211014 -> -4.621 bits needed: 17
digits: 5 rounding: false input: -4.6211014 -> -4.6211 bits needed: 21
digits: 5 rounding: true input: -4.6211014 -> -4.6211 bits needed: 21
digits: 6 rounding: false input: -4.6211014 -> -4.6211 bits needed: 24
digits: 6 rounding: true input: -4.6211014 -> -4.6211 bits needed: 24
digits: 7 rounding: false input: -4.6211014 -> -4.621101 bits needed: 28
digits: 7 rounding: true input: -4.6211014 -> -4.621101 bits needed: 28
digits: 8 rounding: false input: -4.6211014 -> -4.6211014 bits needed: 31
digits: 8 rounding: true input: -4.6211014 -> -4.6211014 bits needed: 31

[truncated to fit word limit]

digits: 1 rounding: false input: -2.8565629 -> -2.0 bits needed: 6
digits: 1 rounding: true input: -2.8565629 -> -3.0 bits needed: 6
digits: 2 rounding: false input: -2.8565629 -> -2.8 bits needed: 10
digits: 2 rounding: true input: -2.8565629 -> -2.9 bits needed: 10
digits: 3 rounding: false input: -2.8565629 -> -2.85 bits needed: 13
digits: 3 rounding: true input: -2.8565629 -> -2.86 bits needed: 13
digits: 4 rounding: false input: -2.8565629 -> -2.856 bits needed: 17
digits: 4 rounding: true input: -2.8565629 -> -2.857 bits needed: 17
digits: 5 rounding: false input: -2.8565629 -> -2.8565 bits needed: 21
digits: 5 rounding: true input: -2.8565629 -> -2.8566 bits needed: 21
digits: 6 rounding: false input: -2.8565629 -> -2.85656 bits needed: 24
digits: 6 rounding: true input: -2.8565629 -> -2.85656 bits needed: 24
digits: 7 rounding: false input: -2.8565629 -> -2.856562 bits needed: 28
digits: 7 rounding: true input: -2.8565629 -> -2.856563 bits needed: 28
digits: 8 rounding: false input: -2.8565629 -> -2.8565629 bits needed: 31
digits: 8 rounding: true input: -2.8565629 -> -2.8565629 bits needed: 31

interesting! i initally went to the Float class source code to see if i could use the source to make a smaller bit depth float but found that it was implemented as a native method :frowning:

I will investigate this option as it may be faster/ more robust than rolling my own :stuck_out_tongue:

True, it does not gurantee 8 digits but it might actually better represent my data… I will have to compare both methods to see which introduces the least error.

Definitely not fixed decimal as the values can go between the full range.

[quote=“Riven,post:8,topic:31600”]
Pardon, that wasn’t my intent. I thought you had come up with a very elegant solution and went on to test it. =)

No prob, I took it as a compliment! :wink:

It was simply to be expected you would come up with something (better) with lots of bit-shifts. :slight_smile:

I have made a specialised version of the method i created which is purely for 8 digit (or less) numbers… numbers higher than 99999999 will be set to 99999999 and numbers below -99999999 will be set to -99999999.


public class MoogieIntFloat {

	static final private int[] multiplier = {1,10,100,1000,10000,100000,1000000,10000000,100000000};
	static public int encode8Digits(double val)
	{
		int count;

//		// test for outliers
		val=Math.max(val,-99999999);
		val=Math.min(val,99999999);
		
		double temp=Math.abs(val);
		if (temp>9999999) count=0;
		else if (temp>999999) count=1;
		else if (temp>99999) count=2;
		else if (temp>9999) count=3;
		else if (temp>999) count=4;
		else if (temp>99) count=5;
		else if (temp>9) count=6;
		else if (temp>=1) count=7;
		else count=8;

		return (int) (val*multiplier[count])+99999999+199999999*count;
	}
	
	static public double decode8Digits(int val)
	{
		return ((double)(val%199999999-99999999))/multiplier[val/199999999];
	}
}

It is very fast as well! I have made a test to time this specialsed version using Riven’s method as a comparision.


Averages:
Riven Avg encode time=391
Moogie Avg encode time=123
Riven Avg decode time=66
Moogie Avg decode time=65


import java.util.Random;

public class Tester {

	static final int numbers=999999;
	static final int VALUE2 = numbers;
	static final int VALUE = VALUE2/2;
	static public void main (String[] args)
	{
		Random random=new Random(-1);
		
		final double[] inputValues=new double[numbers];
		int[] outputValues=new int[numbers];
		
		System.out.print("making values... ");
		for (int i=0;i<numbers;i++)
		{
			inputValues[i]=random.nextDouble()*VALUE2-VALUE;
		}
		System.out.println("done");
		
		long[] moogieEncodeTimes=new long[100];
		long[] moogieDecodeTimes=new long[100];
		
		long[] rivenEncodeTimes=new long[100];
		long[] rivenDecodeTimes=new long[100];
		
		for (int j=0;j<100;j++)
		{
			long startTime=System.currentTimeMillis();
			for (int i=0;i<numbers;i++)
			{
				outputValues[i]=RivenIntFloat.encodeFast(inputValues[i]);
			}
			
			long endTime=System.currentTimeMillis();
			
			rivenEncodeTimes[j]=(endTime-startTime);
			endTime=System.currentTimeMillis();
			
			System.out.println("Riven encode time=" +rivenEncodeTimes[j]);
			
			startTime=System.currentTimeMillis();
			for (int i=0;i<numbers;i++)
			{
				RivenIntFloat.decodeFast(outputValues[i]);
			}
			
			endTime=System.currentTimeMillis();
			
			rivenDecodeTimes[j]=(endTime-startTime);
			endTime=System.currentTimeMillis();
			
			System.out.println("Riven decode time=" +rivenDecodeTimes[j]);
			
			startTime=System.currentTimeMillis();
			for (int i=0;i<numbers;i++)
			{
				outputValues[i]=MoogieIntFloat.encode8Digits(inputValues[i]);
			}
			
			endTime=System.currentTimeMillis();
			
			moogieEncodeTimes[j]=(endTime-startTime);
			
			
			System.out.println("Moogie encode time=" +moogieEncodeTimes[j]);
			
			startTime=System.currentTimeMillis();
			for (int i=0;i<numbers;i++)
			{
				MoogieIntFloat.decode8Digits(outputValues[i]);
			}
			
			endTime=System.currentTimeMillis();
			
			moogieDecodeTimes[j]=(endTime-startTime);
			endTime=System.currentTimeMillis();
			
			System.out.println("Moogie decode time=" +moogieDecodeTimes[j]);
		}
		
		System.out.println("Averages:");
		
		long temp=0;
		for (int i=0;i<100;i++)
		{
			temp+=rivenEncodeTimes[i];
		}
		
		System.out.println("Riven Avg encode time=" +temp/100);
		
		temp=0;
		for (int i=0;i<100;i++)
		{
			temp+=moogieEncodeTimes[i];
		}
		
		System.out.println("Moogie Avg encode time=" +temp/100);

		temp=0;
		for (int i=0;i<100;i++)
		{
			temp+=rivenDecodeTimes[i];
		}
		
		System.out.println("Riven Avg decode time=" +temp/100);
		
		temp=0;
		for (int i=0;i<100;i++)
		{
			temp+=moogieDecodeTimes[i];
		}
		
		System.out.println("Moogie Avg decode time=" +temp/100);		
		

	}
	
}



public class RivenIntFloat {
	  private static double[] pows = new double[] { 1, 10, 100, 1000, 10000, 100000, 1000000, 10000000, 100000000 };

	   public static final int encodeFast(double d)
	   {
	      int e = 0;
	      double pow = 0.1;
	      double abs_d = Math.abs(d);
	      while (abs_d >= (pow *= 10.0) && e++ < 8);
	      e = Math.max(0, 8 - e);
	      return (int) ((Math.floor(d * pows[e]) + 99999999L) * 9 + e);
	   }

	   public static final double decodeFast(int i)
	   {
	      return (i / 9 - 99999999) / pows[i % 9];
	   }
}

Generally, shifts and mults are quicker than divides… so try this one:


class TimfodenIntFloat
{
	private static double[] muls = new double[]
	{
		1, 10, 100, 1000, 10000, 100000, 1000000, 10000000, 100000000
	};
	
	private static double[] divs = new double[]
	{
		1, 0.1, 0.01, 0.001, 0.0001, 0.00001, 0.000001, 0.0000001, 0.00000001,
	};

	public static final int encodeFast( double d )
	{
		// restrict range of d.
		d = d < -99999999 ? -99999999 :
			d >  99999999 ?  99999999 : d;

		// find exponent.
		double abs_d = Math.abs(d);
//		double abs_d = d < 0 ? -d : d;
		int e = 8;
		if(      abs_d >= 10000000 ) e = 0;
		else if( abs_d >=  1000000 ) e = 1;
		else if( abs_d >=   100000 ) e = 2;
		else if( abs_d >=    10000 ) e = 3;
		else if( abs_d >=     1000 ) e = 4;
		else if( abs_d >=      100 ) e = 5;
		else if( abs_d >=       10 ) e = 6;
		else if( abs_d >=        1 ) e = 7;

		// encode.
		return (int)(d * muls[e]) + 0x08000000 + (e << 28);
	}
	
	public static final double decodeFast( int i )
	{
		int e = (i >> 28) & 0x0F;
		int m = i & 0x0FFFFFFF - 0x08000000;
		return m * divs[e];
	}
}

And have a quick think about how (d > 9) != (d >= 10) for real numbers! :slight_smile:
(and thus how the exponent would be effected.)

Cheers, Tim.