suggestions for optimization

moogie · November 9, 2003, 10:12pm

I have finally reached a point in my development my current pet project of making a lossless video codec where i need to start thinking of optimizations.

Currently i amd working on the ‘key frame’ image compressor and have made a stable build which does compress reasonably well, however it is not as fast as i would like as i will eventually need it to be able to uncompress the image at 30 images per second.

I would like to hear some peoples opinons on how to go about optimizing my code.

according to JProfiler, 70% of the time is spend in my ‘encode’ method, which is to be expected. So optimizing this code would speed up the compression significantly.

Here is the encode method:


      // encodes a given pixel (w,h)
      public void encode(int w,int h,int k,int width,byte[][] frame, boolean verbose,boolean[][] ODCodes,boolean [][] IDCodes,boolean[][] huffCodes, OutputStream fos) throws IOException
      {
                  
            int Rdiff=0;
            int Gdiff=0;
            int Bdiff=0;
            int RGdiff;
            int RBdiff;
            int red;      
            int bitMask=0x01;            
            int innerBitCount;
            
                  
                  

            int j=h*width+w;
            {
                  if (k>-1)
                  {
                        Rdiff=256+((frame[j][0] & 0xFF)-(frame[k][0] & 0xFF));
                        Gdiff=256+((frame[j][1] & 0xFF)-(frame[k][1] & 0xFF));
                        Bdiff=256+((frame[j][2] & 0xFF)-(frame[k][2] & 0xFF));
                  }

                  innerBitCount=0;
                  // Determine the number of bits compressing via the InnerDifferences will give.
                              
                  innerBitCount+=huffCodes[((int)(frame[j][0]) & 0xff)].length+1;
                              
                  RGdiff= 256+((frame[j][1] & 0xFF)-(frame[j][0] & 0xFF));
                  if (IDCodes[RGdiff].length>0 && IDCodes[RGdiff].length<huffCodes[(frame[j][1] & 0xFF)].length)
                  {
                        innerBitCount+=IDCodes[RGdiff].length+1;
                  }
                  else
                  {
                        innerBitCount+=huffCodes[((int)(frame[j][1]) & 0xff)].length+1;
                  }
                              
                  RBdiff= 256+((frame[j][2] & 0xFF)-(frame[j][0] & 0xFF));
                  if (IDCodes[RBdiff].length>0 && IDCodes[RBdiff].length<huffCodes[(frame[j][2] & 0xFF)].length)
                  {
                        innerBitCount+=IDCodes[RGdiff].length+1;
                  }
                  else
                  {
                        innerBitCount+=huffCodes[((int)(frame[j][2]) & 0xff)].length+1;
                  }
                        
                  // if the pixel we are comparing the current pixel to is in range and all the RGB components of the current pixel are in
                  // the Outer Differences and coding using the OuterDifferences yields better compression than via the InnerDifferences write the OD codes.
                  if (k>-1 && ODCodes[Rdiff].length>0 && ODCodes[Gdiff].length>0 && ODCodes[Bdiff].length>0 && (innerBitCount>ODCodes[Rdiff].length+ODCodes[Gdiff].length+ODCodes[Bdiff].length+1))
                  {
                              
                        value=value<<1 |  bitMask; count++;
                        addToBuffer(fos);

                        for (int i=0;i<ODCodes[Rdiff].length;i++)
                        {
                              value=value<<1; count++;
                              if (ODCodes[Rdiff][i]) value=value| bitMask;
                              addToBuffer(fos);
                        }
                        
                        for (int i=0;i<ODCodes[Gdiff].length;i++)
                        {
                              value=value<<1; count++;
                              if (ODCodes[Gdiff][i]) value=value| bitMask;
                              addToBuffer(fos);
                        }
                        
                        for (int i=0;i<ODCodes[Bdiff].length;i++)
                        {
                              value=value<<1; count++;
                              if (ODCodes[Bdiff][i]) value=value| bitMask;
                              addToBuffer(fos);
                        }
                  }
                  else
                  { 
                  
                        // set Red component to its huffman code
                        red=((int)(frame[j][0]) & 0xff);

                        value=value<<1; 
                        count++;
                        addToBuffer(fos);

                              
                        for (int i=0;i<huffCodes[red].length;i++)
                        {
                              value=value<<1; count++;
                              if (huffCodes[red][i]) value=value| bitMask;
                              addToBuffer(fos);
                        }
            
                              
                        // if the Red-Green Difference is in the Inner Differences add the ID huff code else add the Green component huffman code
                        // unless the Green component huffman code is less than the difference code in which case the Green component huffman code is written.
                              
                        
                        RGdiff= 256+((frame[j][1] & 0xFF)-(frame[j][0] & 0xFF));
                        if (IDCodes[RGdiff].length>0 && IDCodes[RGdiff].length<huffCodes[(frame[j][1] & 0xFF)].length)
                        {
                              value=value<<1 |  bitMask; count++;
                              addToBuffer(fos);
                              
                              for (int i=0;i<IDCodes[RGdiff].length;i++)
                              {
                                    value=value<<1; count++;
                                    if (IDCodes[RGdiff][i]) value=value| bitMask;
                                    addToBuffer(fos);
                              }
                        }
                        else
                        {
                              value=value<<1; count++;
                              addToBuffer(fos);
                              for (int i=0;i<huffCodes[(frame[j][1] & 0xFF)].length;i++)
                              {
                                    value=value<<1; count++;
                                    if (huffCodes[(frame[j][1] & 0xFF)][i]) value=value| bitMask;
                                    addToBuffer(fos);
                              }
                                    
                        }
                              
                        // if the Red-Blue Difference is in the Inner Differences add the ID huff code else add the Blue component huffman code
                        // unless the Blue component huffman code is less than the difference code in which case the Blue component huffman code is written.

                        RBdiff= 256+((frame[j][2] & 0xFF)-(frame[j][0] & 0xFF));
                        if (IDCodes[RBdiff].length >0 && IDCodes[RBdiff].length<huffCodes[(frame[j][2] & 0xFF)].length)
                        {
                              value=value<<1 |  bitMask; count++;
                              addToBuffer(fos);
                              
                              for (int i=0;i<IDCodes[RBdiff].length;i++)
                              {
                                    value=value<<1; count++;
                                    if (IDCodes[RBdiff][i]) value=value| bitMask;
                                    addToBuffer(fos);
                              }
                        }
                        else
                        {
                              value=value<<1; count++;
                              addToBuffer(fos);
                              for (int i=0;i<huffCodes[(frame[j][2] & 0xFF)].length;i++)
                              {
                                    value=value<<1; count++;
                                    if (huffCodes[(frame[j][2] & 0xFF)][i]) value=value| bitMask;
                                    addToBuffer(fos);
                              }
                        }
                  }
            }
      }

similarily, in the decompressor, the decode method takes most of the time and i would like to optimize it:


      // decodes a given pixel at (w,h)      
      public void decode(int w,int h,int k,Read read,byte[][] frame) throws IOException
      {
            
                  
            
                  
            boolean bit=bit=read.read();
            int count=h*width+w;
                  

                  ///// re-assemble pixel using Outer Differences /////

                  if (bit)
                  {
                        // Red component
                        frame[count][0]=(byte) (-256+traverseTree(ODroot,read) + frame[k][0]);

                        // Green component
                        frame[count][1]=(byte) (-256+traverseTree(ODroot,read) +  frame[k][1]);

                        // Blue Component
                        frame[count][2]=(byte) (-256+traverseTree(ODroot,read) + frame[k][2]);

                  }
                  else
                  {
                              
                        ///// re-assemble using Inner Differences /////

                        // Red component - from huffman code
                        frame[count][0]=traverseTree(root,read);
                        //int red=((int)(frame[count][0]) & 0xff);

                        bit=read.read();
                              
                        // Green component - from Inner Differences else from huffman code
                        if (bit)
                        {
                              frame[count][1]=(byte) ((traverseTree(IDroot,read) & 0xFF) + frame[count][0]);
                        }
                        else
                        {
                              frame[count][1]=traverseTree(root,read);
                              //red=((int)(frame[count][1]) & 0xff);
                        } 

                        bit=read.read();
                              
                        //  Blue component - from Inner Differences else from huffman code
                        if (bit)
                        {
                              frame[count][2]=(byte) ((traverseTree(IDroot,read) & 0xFF) + frame[count][0]);
                        }
                        else
                        {
                              frame[count][2]=traverseTree(root,read);
                              //red=((int)(frame[count][2]) & 0xff);
                        }
                  }
      }

The current byte code of the compressor/decomressor can be found at: www.geocities.com/budgetanime/avid4.zip.txt (Note: you will have to remove the .txt extension once downloaded)

moogie · November 9, 2003, 10:14pm

I can provide full source if it is necessary.

I do have some comments in my code, however it is not the best

Jeff · November 9, 2003, 10:34pm

Suggestion 1: Move this to the performanc tuning topic. Your likely to get a lot more help

JK

Jeff · November 9, 2003, 10:37pm

Next suggestion: You are doing a LOT of arbitrary (non-deterministic) indexing into your frame array. Thatrs likely costing you a lot.

My suggestion: define some local variables to hold values like frame[k][0] so you are only doing the indexing and resulting bounds check once per value.

moogie · November 9, 2003, 11:25pm

[quote]Suggestion 1: Move this to the performanc tuning topic. Your likely to get a lot more help

JK
[/quote]
originally i was going to put it there, however after viewing the topics there it seemed that people were giving optimized methods, not people asking for optimization help… but please do move this thread if i put it in the wrong place.

moogie · November 9, 2003, 11:28pm

[quote]Next suggestion: You are doing a LOT of arbitrary (non-deterministic) indexing into your frame array. Thatrs likely costing you a lot.

My suggestion: define some local variables to hold values like frame[k][0] so you are only doing the indexing and resulting bounds check once per value.
[/quote]
good point! I will do so. On the same lines, i think i might change all the local instance variables that are created each time encode is called to class member variables, so i dont have to create them each time.

Jeff · November 10, 2003, 12:04am

If you are actaually creating objects with new then yeah you might want to reuse them.

If they are just local primatives they they are just getting placed on the stack and theres no big deal there.

moogie · November 10, 2003, 12:33am

hmm, i modifed the code, however it seems to have little to no impact on performance… i wonder if the profiler misled me and the problem exists else where…

I have managed to speed up the decompressor a little bit by instead of recusive tree traversal i have made the traveral iterative.

However it is still takes twice as long to decompress as to compress… which i put down to the cost of tree traversal. I need to think of a way to convert the tree into an array me thinks.

here is the modified encode:


      class Encode
      {
            int Rdiff;
            int Gdiff;
            int Bdiff;
            int RGdiff;
            int RBdiff;
            int red;      
            int bitMask=0x01;            
            int innerBitCount;
            byte[] currentPixel;
            byte[] comparatorPixel;
            
            public Encode()
            {
                  currentPixel=new byte[3];            
                  comparatorPixel=new byte[3];
            }
            
            // encodes a given pixel (w,h)
            public void encode(int w,int h,int k,int width,byte[][] frame, boolean verbose,boolean[][] ODCodes,boolean [][] IDCodes,boolean[][] huffCodes, OutputStream fos) throws IOException
            {
                  currentPixel=frame[h*width+w];
                  if (k>-1)
                  {
                        comparatorPixel=frame[k];
                        Rdiff=256+((currentPixel[0] & 0xFF)-(comparatorPixel[0] & 0xFF));
                        Gdiff=256+((currentPixel[1] & 0xFF)-(comparatorPixel[1] & 0xFF));
                        Bdiff=256+((currentPixel[2] & 0xFF)-(comparatorPixel[2] & 0xFF));
                  }
                  

                  innerBitCount=0;
                  // Determine the number of bits compressing via the InnerDifferences will give.
                        
                  innerBitCount+=huffCodes[((int)(currentPixel[0]) & 0xff)].length+1;
                        
                  RGdiff= 256+((currentPixel[1] & 0xFF)-(currentPixel[0] & 0xFF));
                  if (IDCodes[RGdiff].length>0 && IDCodes[RGdiff].length<huffCodes[(currentPixel[1] & 0xFF)].length)
                  {
                        innerBitCount+=IDCodes[RGdiff].length+1;
                  }
                  else
                  {
                        innerBitCount+=huffCodes[((int)(currentPixel[1]) & 0xff)].length+1;
                  }
                              
                  RBdiff= 256+((currentPixel[2] & 0xFF)-(currentPixel[0] & 0xFF));
                  if (IDCodes[RBdiff].length>0 && IDCodes[RBdiff].length<huffCodes[(currentPixel[2] & 0xFF)].length)
                  {
                        innerBitCount+=IDCodes[RGdiff].length+1;
                  }
                  else
                  {
                        innerBitCount+=huffCodes[((int)(currentPixel[2]) & 0xff)].length+1;
                  }
                  
                  // if the pixel we are comparing the current pixel to is in range and all the RGB components of the current pixel are in
                  // the Outer Differences and coding using the OuterDifferences yields better compression than via the InnerDifferences write the OD codes.
                  if (ODCodes[Rdiff].length>0 && ODCodes[Gdiff].length>0 && ODCodes[Bdiff].length>0 && (innerBitCount>ODCodes[Rdiff].length+ODCodes[Gdiff].length+ODCodes[Bdiff].length+1))
                  {
                              
                        value=value<<1 |  bitMask; count++;
                        addToBuffer(fos);

                        for (int i=0;i<ODCodes[Rdiff].length;i++)
                        {
                              value=value<<1; count++;
                              if (ODCodes[Rdiff][i]) value=value| bitMask;
                              addToBuffer(fos);
                        }
                        
                        for (int i=0;i<ODCodes[Gdiff].length;i++)
                        {
                              value=value<<1; count++;
                              if (ODCodes[Gdiff][i]) value=value| bitMask;
                              addToBuffer(fos);
                        }
                        
                        for (int i=0;i<ODCodes[Bdiff].length;i++)
                        {
                              value=value<<1; count++;
                              if (ODCodes[Bdiff][i]) value=value| bitMask;
                              addToBuffer(fos);
                        }
                  }
                  else
                  { 
                  
                        // set Red component to its huffman code
                        red=((int)(currentPixel[0]) & 0xff);

                        value=value<<1; 
                        count++;
                        addToBuffer(fos);

                              
                        for (int i=0;i<huffCodes[red].length;i++)
                        {
                              value=value<<1; count++;
                              if (huffCodes[red][i]) value=value| bitMask;
                              addToBuffer(fos);
                        }
            
                              
                        // if the Red-Green Difference is in the Inner Differences add the ID huff code else add the Green component huffman code
                        // unless the Green component huffman code is less than the difference code in which case the Green component huffman code is written.
                              
                        
                        RGdiff= 256+((currentPixel[1] & 0xFF)-(currentPixel[0] & 0xFF));
                        if (IDCodes[RGdiff].length>0 && IDCodes[RGdiff].length<huffCodes[(currentPixel[1] & 0xFF)].length)
                        {
                              value=value<<1 |  bitMask; count++;
                              addToBuffer(fos);
                              
                              for (int i=0;i<IDCodes[RGdiff].length;i++)
                              {
                                    value=value<<1; count++;
                                    if (IDCodes[RGdiff][i]) value=value| bitMask;
                                    addToBuffer(fos);
                              }
                        }
                        else
                        {
                              value=value<<1; count++;
                              addToBuffer(fos);
                              for (int i=0;i<huffCodes[(currentPixel[1] & 0xFF)].length;i++)
                              {
                                    value=value<<1; count++;
                                    if (huffCodes[(currentPixel[1] & 0xFF)][i]) value=value| bitMask;
                                    addToBuffer(fos);
                              }
                                    
                        }
                              
                        // if the Red-Blue Difference is in the Inner Differences add the ID huff code else add the Blue component huffman code
                        // unless the Blue component huffman code is less than the difference code in which case the Blue component huffman code is written.

                        RBdiff= 256+((currentPixel[2] & 0xFF)-(currentPixel[0] & 0xFF));
                        if (IDCodes[RBdiff].length >0 && IDCodes[RBdiff].length<huffCodes[(currentPixel[2] & 0xFF)].length)
                        {
                              value=value<<1 |  bitMask; count++;
                              addToBuffer(fos);
                              
                              for (int i=0;i<IDCodes[RBdiff].length;i++)
                              {
                                    value=value<<1; count++;
                                    if (IDCodes[RBdiff][i]) value=value| bitMask;
                                    addToBuffer(fos);
                              }
                        }
                        else
                        {
                              value=value<<1; count++;
                              addToBuffer(fos);
                              for (int i=0;i<huffCodes[(currentPixel[2] & 0xFF)].length;i++)
                              {
                                    value=value<<1; count++;
                                    if (huffCodes[(currentPixel[2] & 0xFF)][i]) value=value| bitMask;
                                    addToBuffer(fos);
                              }
                        }
                        
                  }
            }
      }

Jeff · November 10, 2003, 2:26am

Um. you didn’t fix the array access.

Your still accessing an array everywhere AFAICT, you just copied it into a smaller array.

The costs are the index add and the bounds check. You havent gotten rid of either.

Put the values in FLAT variables, NOT arrays.

moogie · November 10, 2003, 3:02am

[quote]Um. you didn’t fix the array access.

Your still accessing an array everywhere AFAICT, you just copied it into a smaller array.
[/quote]
?! i was under the impression that the cost to extract a value from an primitive array is the same as from a primitive as all the array consits of is pointers.

but i guess i am incorrect! :-/ ah, well it happens enough ;D

So you would suggest the following?:

currentPixelRed=frame[j][0];
currentPixelGreen=frame[j][1];
currentPixelBlue=frame[j][2];

which bound checking are you specifically talking about?

all the “##Codes[RBdiff].length >0” are needed as is the inital “if (k>-1)”

the count++ is necessary as it is used elsewhere in the program with the saving of the bit stream

basically, i cannot see any checking which i can remove without effecting the algorithm.

what does this mean?

Jeff · November 10, 2003, 3:36am

[quote]which bound checking are you specifically talking about?
[/quote]
sigh

Okay I see I have to start at the VERY beginning.

First off, forget about C. Java is different. It has differnt behaviors and different performance characteristics.

EVERY time you use an index into an array in Java there is a potential bounds check as Java, unlike C, does not allow you to over-run or under-run arrays.

When you are looping through indexes in an obvious way like this…



for(i=0;i<max;i++){
     System.out.println(foo[i]);
}

Then the VM can do something called “bounds check hoisting” and only check the start and end values of i. But otherwise EVERY index into an array has to be checked and IS checked by the VM.

This is what I was talking about.

For more information on Java performance characteristics I’d strongly suggest you read Steve and my book as you will learn a lot: http://java.sun.com/docs/books/performance
The whole text of the book is there at the web site if you can’t afford to buy a copy.

JK

P.S. Just as an aside, even in C x = foo[i] is more expensive then x = bar. Fetchign the value of bar is
a single memory access., resolving foo[i] is the EXACTLY the same as saying x= *(foo+i).

In C if you have an array foo[] then foo is the base address of that array. You find an index into it by adding the index to the bae and then dereferencing that resultant address. Thus there is an extra addition in every access. In c this…


int i = 0;
int v;
for(i=0;i<MAX;i++){
    v += foo[i];
}

Is much more efficiently written as


int i=0;
int v;
int *fooptr = foo;
for(int i=0;i<MAX;i++)
    v += *fooptr++;
}

moogie · November 10, 2003, 4:24am

ah, i see where we were having issues our interpetations of what consitutes bound checking differed. I understand that java does not allow overruns, however i guess i never knew the process in which java checks the requested element was called bound checking, rather i thought that the checks the programmer puts in is what is called bound checking.

It is all clear now

Thanks!