clear up on how data is stored/read

I was wondering if you guys can clear up a few questions for me concering how data is stored.

  1. If I have a txt file that is 78,390 bytes long. If I wish to view and manipulate the raw binary data, I have to convert byte data into binary data and then write the binary data to a txt file. This binary data txt file is 8 times larger than the orignal txt file. Is there a way to have access and/or manipulate the original txt file’s binary data without having to write it to a file?

  2. The previous is a compression related question, so I was also wondering if anyone can explain the principle upon which winZip or winRar work? How do these programs compress the data to a smaller size? What level of access do they have to the file they are compressing?

Edit: I have found an article on the Huffman Compression algorithm and have answered the second question myself. I am still interested in how I can access the data on that level with Java though. I have found a java applet that shows the process, but it doesnt actually write the compressed file.

Any answers to these questions are greatly appreciated.

Hi,
How are you opening the file?? With a reader??

This can help you…

String filename = "myFile.txt";
DataInputStream dis = new DataInputStream(new FileInputStream(filename));
byte b = dis.readByte();
//Do what  yo need with the byte
if((b & 0x01) != 0){
  // It's an odd number!!
}

Rafael.-

[quote]I was wondering if you guys can clear up a few questions for me concering how data is stored.

  1. If I have a txt file that is 78,390 bytes long. If I wish to view and manipulate the raw binary data, I have to convert byte data into binary data and then write the binary data to a txt file. This binary data txt file is 8 times larger than the orignal txt file.
    [/quote]
    Im not sure I udnerstand this. Do you mena to say youa re taking each bit sequentially in the file and writing it out as a 0 or 1 valued byte?

If so why? :slight_smile:

Well, huffman is just oen of many compression schemes and a failry simple on.

But if yopu want to work with Zip style comrepssed fiels in Java the ehavy lifting is all done for you by Zip* classes :slight_smile: (ZipFile, ZipInputStream and ZipOutputStream). A ncie Java feature.

Any answers to these questions are greatly appreciated.
[/quote]

Well my point of confusion is that if I take a txt file and run a Huffman algorith on it, How would I write the compressed binary back to a smaller file? Just convert it back into text (it would be all jumpbled of course)?

[quote]Well my point of confusion is that if I take a txt file and run a Huffman algorith on it, How would I write the compressed binary back to a smaller file? Just convert it back into text (it would be all jumpbled of course)?
[/quote]
How is your binary data stored in the program? In a byte array, char array??

If each element in your array represents 1 bit you should pack it into bytes before writing to the output file.

This code packs the the data from an array of ints (with values 0 or 1) into bytes and writes them to a file.


  private void save(String filename){
    DataOutputStream dos = new DataOutputStream(new FileOutputStream(filename));
    byte b;
    int pos = 0;
    int[] compressedData = new int[1000];
    //here you fill the array
    int total = 58; //this is the total bits
    while (pos < total) {
      b = encode(compressedData,pos);
      dos.writeByte(b);
      pos += 8;
    }
    dos.close();
  }
  private byte encode(int[] data, int pos) {
    int more = 0;
    byte b = 0;
    while (more < 8) {
      if (pos + more < data.length)
        b |= data[pos + more];
      if(more < 7)
        b <<= 1;
      more++;
    }
    return b;
  }  

[quote]Well my point of confusion is that if I take a txt file and run a Huffman algorith on it, How would I write the compressed binary back to a smaller file? Just convert it back into text (it would be all jumpbled of course)?
[/quote]
As the man says, it is totally dependant on how the data is stored and what exactly you are “huffman-ing.”

If your goal is to reduce individual bytes down to sub-byte strinsg of bits (less then 8 bits per symbol) then you need to byte pack it or you will get no benefit.

That seems fairly obvious, so myabe Im missing something?

Is there a particualr reason why you need a aprticualr encoding though?

This seems a lot of work to avoid using the Zip classes, which will gvie you much better comrepssion then a simple huffman on the individual bytes.

[quote] This seems a lot of work to avoid using the Zip classes, which will gvie you much better comrepssion then a simple huffman on the individual bytes.
[/quote]
In jGuru I found the next code:

import java.io.*;
import java.util.zip.*;

public class zipper {
    public static final void main(String [] args){
      try {
          ZipOutputStream outStream = new ZipOutputStream (new 
            FileOutputStream("out.zip"));
              
          writeEntry(outStream, "f1.txt", "this is some text");
          writeEntry(outStream, "f2.txt", "more text");
          writeEntry(outStream, "f3.txt", "text, text, and yet more text");
          
          outStream.close();
      }
      catch(Exception e){
          e.printStackTrace();
      }
    }

    static void writeEntry(ZipOutputStream stream,
                     String          filename,
                     String          text){
      try {
          ZipEntry e1 = new ZipEntry(filename);
          e1.setMethod(ZipEntry.DEFLATED);
          stream.putNextEntry(e1);
          stream.write(text.getBytes());
          stream.closeEntry();
      }
      catch(IOException e){
          e.printStackTrace();
      }
    }
}

This way the zip stream compresses the data for you, and you can put more than one file in the zip.

Rafael.-

No reason, I am just intersted in how compression works and trying to write an example of it. Thank you for your answers, they were very helpful.

Question: what kind of operator is |= ?

a |= b

is equivilent to:

a = a | b

ie. a bitwise OR.

The != operator is a short version for a = a|b.
The | is the binary or operator.

hehe ok I dont know why I didnt see that in the first place :slight_smile:

Oh! Well if you are trying to learn about compression, might I suggest
you google “LZW”?

Lempel/Ziff is well documented. Unfortunately its also patented so you can’t use it in a product without a lciense (Unisys’s terms actually used to be pretty reasonable. They’d give youa fre elicense for free stuff and charge you for comemrical use.)

Its quite aways beyond a simple huffman, though not so far that i think yould have any problems grapsing it. I actually wrote an LZW decomrepssor in Java years ago in order to read PDF files, which use it.