Bits, Bytes and Insanity

Greetings JGO!

I’ve come here to share some code which I’ve been using to feed my own personal arrogance and the need for exploration (in code of course).

So what is this code?

It’s essentially a naive compact storage method which I’ve used in my own experiments. It’s not really optimized but I’ve used it and continue to use it. It’s become something of a small passion of mine. It did take some thought and time to put together properly.

Let’s look at an example where something like this might come in handy.

Let’s say you need to store a bunch of values into a file. These values range between 0 and 120. For this example it’s simple, you can instantiate an array of bytes (8 bits) with each position in the array capable of holding all your values as long as they are between the original design of 0 and 120.

But what if?

Lets say you need to store a bunch of values into a file. These values range between 0 and 555. For this example it’s also simple, you can instantiate an array of shorts (16 bits) with each position in the array capable of holding all your values as long as they are between the original design of 0 and 120.

In general, this will be of no problem, but there are scenarios where you may wish to squeeze every little saving in file size as you can, especially for cases where it will be transferred over a network.

The Binary Compactor was created to solve this problem. It essentially allows you to write and read values with variable bit lengths. In the case of the 2nd example, you can fit values of 0-555 into 10 bits, which will save you 6 bits of information per array position.

Let’s look at some code on how this works.


// first, we initialize a new writer with default write position at 0
// can also be used to load a previous stream from file
BitWriter writer = new BitWriter();

// let's write 3 boolean values, these will take a total storage of 3 bits
writer.write(true);
writer.write(false);
writer.write(true);

// let's write some integers. First argument is the value to be written, 
// 2nd argument is the number of bits it will take.
writer.write(555, 10);
writer.write(62, 6);
writer.write(1792, 12);
writer.write(57892, 17);
writer.write(7658, 14);

// and just for fun, dump this stream into console
System.out.println("BITS: " + writer.getBitString());

// prints
// BITS: 10111010 10001011 11100000 00011100 01001000 10001110 01010111 10111000


Now to iterate those values back.


// define a new iterator from the writer
BitIterator it = writer.getIterator();

// start iterating the previous variables. Notice how you have to tell
// the iterator how many bits to read

// the boolean values from before
System.out.println(it.getNext32(1));
System.out.println(it.getNext32(1));
System.out.println(it.getNext32(1));

// the integer values from before
System.out.println(it.getNext32(10));
System.out.println(it.getNext32(6));
System.out.println(it.getNext32(12));
System.out.println(it.getNext32(17));
System.out.println(it.getNext32(14));

// prints
// 1
// 0
// 1
// 555
// 62
// 1792
// 57892
// 7658


Unfortunately the entire method is too large to post in here however it is available free of charge (along with a C# implementation) at my github repository https://github.com/DavidArayan/EzyBits

Will most of you have any use for this? NO
Should you use this instead of normal int/short/longs? NO

However there will come a time, at some place, somewhere in your own experiments where you would say “I wish I had an easy method of reading and writing any variable in any bit length I want, cause of reasons”.

Contributions, Feature Requests and Comments all welcome!

It is important to note that we CAN reduce file sizes by compacting numbers into eachother, but due to a lot of computer float issues, its hard.

However, I’ve managed to use a procedural fake system in my programming environment that square roots a whole file as one big number (converts bytes to number). Then I made a procedural multiplier which multiplies it out. It’s pretty fast too. Takes the file size down a great 60%-80%.

A bit input / output stream is surprisingly useful when you’re dealing with native code (eg. reading codecs and such).

Cas :slight_smile:

Or realtime networking code, where you try to fit everything+dog in ~1490 bytes.

Implement Huffman compression and decompression with it!

I have my own version of this too! But its just a reader rather than a reader and writer. Certain file formats are bit orientated data sets (ogg for example) and its very nice to read the data as if you are using a ByteBuffer. My implementation used ByteBuffers rather than arrays. The other useful it does is read Little or Big Endian numbers from the bits. If you are interested I can post the code, its just a simple class.

I’ve used this to great effect in generating image masks to be transferred over a network. Use this and a simple quad-tree encoding gives a best case scenario (for a mask) of 22 bits. 10 bit for width, 10 bits for height and 2 bits for the data (1 bit for quadtree node and 1 bit for the mask state). Best case for masks with either all 0’s or all 1’s for the mask.

Worst case would be something like a checker pattern down to each pixel, where the overall size would be worse than using a bit array mask, mainly due to the extra bit’s required for the quad-tree subdivision.

Obviously best cases are rare, still good to think about it!

http://dsiutils.di.unimi.it/ comes with a pretty complete input/output bit-stream. some of the more exotic encodings are interesting.

http://dsiutils.di.unimi.it/docs/it/unimi/dsi/io/package-summary.html

http://dsiutils.di.unimi.it/docs/it/unimi/dsi/io/InputBitStream.html
http://dsiutils.di.unimi.it/docs/it/unimi/dsi/io/OutputBitStream.html