Array itteration speed up + other tips needed

moogie · February 19, 2004, 7:01pm

There is no real goal in mind… it is just a personal programming project. I have semi-given up on programming games… i wanted to try making some thing useful

The compressor i have made works best in the same domain as the PNG format, i.e. cartoons, CGI, etc.

In some areas, my key frame compressor beats PNG and some areas it is worse.

rreyelts · February 19, 2004, 7:29pm

Umm… what are you using for the OutputStream? If you’re doing any real disk i/o here, it’s going to dwarf the time that you’re spending on the CPU - which would not be a good benchmark for how fast you can actually encode or decode.

Some more questions - What JVM are you using and are you using -server when running?

God bless,
-Toby Reyelts

moogie · February 19, 2004, 8:15pm

I am running jdk 1.4.01 using the normal client setting.

The stream is ultimately a byte buffer as seen below:



                        FileOutputStream fos2=new FileOutputStream(outFile);
                        GZIPOutputStream fos1=new GZIPOutputStream(fos2);
                        DataOutputStream fos=new DataOutputStream(fos1);
                        
                  
                        ByteArrayOutputStream controlS=new ByteArrayOutputStream();
                        ByteArrayOutputStream IDS=new ByteArrayOutputStream();
                        ByteArrayOutputStream ODS=new ByteArrayOutputStream();
                        ByteArrayOutputStream CCS=new ByteArrayOutputStream();
                        
                        BitOutputStream controlW=new BitOutputStream(controlS);
                        BitOutputStream IDW=new BitOutputStream(IDS);
                        BitOutputStream ODW=new BitOutputStream(ODS);
                        BitOutputStream CCW=new BitOutputStream(CCS);

moogie · February 19, 2004, 9:12pm

here are the results of a profiling session:

http://www.geocities.com/budgetanime/profile1.txt

http://www.geocities.com/budgetanime/profile2.txt

http://www.geocities.com/budgetanime/profile3.txt

The source code and a sample compressed sequence can be found at: http://hdiff.mystarship.com

The sample is probably too biased, however i do not have the webspace to upload a more taxing sample.

rreyelts · February 19, 2004, 9:19pm

Comments:

About 2D arrays vs 1D arrays:

1D arrays are faster for several reasons.

a) They are guaranteed to be contiguous. This means that they are much more likely to work well with your processor’s cache prefetch mechanism. In order to take the best advantage of this, you should process the pixel data in the order it is stored in memory. The importance of coding to the cache can’t be overstated.

b) Accessing a single array only costs 1 bounds check. In your case, you’re paying six (3*2) bounds checks each time you pull out a pixel when you should only be paying for one check - and if you write your code better, 0.

c) 2D arrays cost linear time to allocate in the 2nd dimension. 1D arrays cost O(1) to allocate.

How you write optimizer friendly code for array bounds:

Write your for loops so that they process array elements linearly, i.e:

for ( int i = 0; i < myArray.length; ++i ) {
// array indexing in here…
}

Don’t play with the index in the loop. Do the actual array indexing in the same function as you’re doing the looping. If you must put the actual array indexing in a different function, make the function small enough that Hotspot will inline it.

You’re not setting an intial size for ByteArrayOutputStream, so it’s growing needlessly. In fact, you should probably get rid of ByteArrayOutputStream altogether and just allocate an array that you can guarantee is large enough for a compressed frame. This will get rid of a lot of needless copying and processing going on.
In BitOutputStream.write() you’re doing a for loop. Practically anywhere you’re performing a loop like that, you should go ahead and perform loop unrolling. You’ll have to play around with the amount of unrolling to see what works best.
In general, you want to be careful about making function calls. I know Jeff told you that function calls in Java are just as fast as than those in C++, but, even in C++, you avoid trying to make millions of function calls per second. (i.e. a function call per pixel in a 300K pixel image). HotSpot may or may not inline the function. but it’s less likely to do so if the function is large (because, in general, it’s bad policy - you end up overflowing the instruction cache much more often).
On comparing Java codec performance vs C++ codec performance:

A well written C++ codec is going to get all of its performance gains by managing the processor’s cache very carefully and exploiting parallelism through loop unrolling and SIMD instructions. It may also be able to avoid some array bounds checks that you may not.

God bless,
-Toby Reyelts

moogie · February 19, 2004, 9:34pm

!

that is very informative!

They dont teach you this sort of thing at uni… i had no idea!

Jeff · February 19, 2004, 9:57pm

Common sense here. Java arrays are bounds checked. 2 dimensions means twice as many checks.

Yeah, i noticed this almost imediately after i posted the code, i had changed it such that the conversion only occurs once at the beginning.

However if there was any speed increase doing this… it seems to be negligable.


                              j=h*width+w;
                              currentPixelRed=frame[j][0] & 0xFF; 
                              currentPixelGreen=frame[j][1] & 0xFF; 
                              currentPixelBlue=frame[j][2] & 0xFF;

how do i let the optimizer know that j is never out of bounds?

At least that way you cant because there is no deterministic guarantee that it is. Its dependent on other values. You get a bad value in width and BOOM you are out of range.

The only way the compielr can know an index stays within a given range is if it has full control over its value and it has well known end points. The most basic example is a for lopp where the index looped on is never assigned to inside of the loop body.

yah thats what they are telling you in their shorthand. The flow and organization of the code is really really hard to follow. I suspect this has to do with either attempts to prematurely optimize or attempts to over-factor your code and avoid all redundancies.

Re write it so it is simple and easy to read and follow. Clean and clear code is ALWAYS easier to tune.

moogie · February 19, 2004, 10:15pm

hmm, i guess i never really thought enough into it to ever think about the bound checking increases for each dimension… however that makes perfect sense!

I am having difficulty visualizing such a loop. Is there an online resource that explains this and related topics?

would this loop be of a type where the optimizer knows that it will be always inrange?


Final Static Max=20;
int[] array = new array[20];

for (int i=0;i<Max;i++)
{
    array[i]++;
}

heh, i thought i had. I am not sure how i would go about cleaning it even more… again are there any good resources to help me there?

rreyelts · February 19, 2004, 10:18pm

[quote]I am running jdk 1.4.01 using the normal client setting.
[/quote]
Oops, forgot to mention that you should use -server. It’s going to perform much stronger optimizations. You’ll also need to run your code for a little while to let the optimizer get a feel for what functions it needs to inline, etc… before performing any timings. You should probably look for and read some postings in this forum about the benchmarking process.

God bless,
-Toby Reyelts

rreyelts · February 19, 2004, 10:33pm

[quote]!

that is very informative!

They dont teach you this sort of thing at uni… i had no idea!
[/quote]
Probably because they don’t want their students going around writing a lot of terrible code. The techniques I talked about should only be used in very performance intensive situtations - i.e. developing a codec. You’re more likely to understand these kind of things if you are deep into computer architecture.

God bless,
-Toby Reyelts

moogie · February 19, 2004, 11:46pm

Yeah, you are probably correct, I know i programmed the codec as i was taught, i.e. OO and easy to maintain. But it does have a speed penalty.

Maybe i will continue to develop the codec as i currently am, i.e. OO and understandable such that i can then use it as reference code for when i or someone else converts it to C.

I will be learning C++ second semester this year so hopefully i will be able to start the conversion process.

hmm… maybe i should make it open source when i have a finished working version in java, so more adept C/C++ programmers can peform the conversion.