Speed

g666 · March 5, 2006, 3:08pm

Are larger arrays more expensive to acces? Becuase i find this:

int npp = ((yi<<widthBits)+xi)<<2;
int pixP1 = np[npp];
int pixP2 = np[++npp];
int pixP3 = np[++npp];
int pixP4 = np[++npp];

to be ~0.75 the spped of this and the above is doing less

int pixP1 = getTexel(xi, yi);
int pixP2 = getTexel(xi+1, yi);
int pixP3 = getTexel(xi, yi+1);
int pixP4 = getTexel(xi+1, yi+1);

the method used above:

public final int getTexel(int x, int y){
	return imgData[((y & heightMask) << widthBits) + (x & widthMask)];
}

jbanes · March 5, 2006, 3:28pm

Try this version for me, and tell me if you’re still seeing a speed difference:


   int npp1 = ((yi<<widthBits)+xi)<<2;
   int npp2 = ((yi<<widthBits)+(xi+1))<<2;
   int npp3 = ((yi<<widthBits)+(xi+2))<<2;
   int npp4 = ((yi<<widthBits)+(xi+3))<<2;
   int pixP1 = np[npp1];
   int pixP2 = np[npp2];
   int pixP3 = np[npp3];
   int pixP4 = np[npp4];

I have a feeling that Hotspot is optimizing the starch out of your routine. As a result, the latter version is completely inlined. Once inlined, it then shows up as faster due to the CPU’s ability to execute out-of-order instructions. As long as you rely on a the same variable for every line, the CPU is required to execute the lines one after another.

Jeff · March 6, 2006, 2:38am

Uh huh.

As usual, a microbenchmark fails to measure what it thinks it is.

Microbenchmarking a VM with a JIT and gettign meaningful results is very hard unless you really understand the deatisl of how the VM does its compilation.

Otherwise, the best benchmarks are real code performing complete functions over a reasonably long epriod of time.

g666 · March 6, 2006, 8:53pm

What jbanes suggested does yeild an improvement. Thank you.

PS. this isnt a benchmark, it is real code being called in a real function millions of times a second. XD

jbanes · March 6, 2006, 9:26pm

g666:

That’s pretty much what I figured. In a modern, superscalar, out-of-order CPU, the more instructions you can untangle, the better the performance. It’s quite possible that each line you access is executing in parallel with the other lines. For the absolute fastest code, try this version:
int npp = ((yi<<widthBits)+xi)<<2;
   int pixP1 = np[npp];
   int pixP2 = np[npp+1];
   int pixP3 = np[npp+2];
   int pixP4 = np[npp+3];
There are no guarantees, but that might eliminate some of the extra processing you were trying to get rid of.

[quote]Thank you.

You’re welcome.