Speed

Are larger arrays more expensive to acces? Becuase i find this:

int npp = ((yi<<widthBits)+xi)<<2;
int pixP1 = np[npp];
int pixP2 = np[++npp];
int pixP3 = np[++npp];
int pixP4 = np[++npp];

to be ~0.75 the spped of this and the above is doing less

int pixP1 = getTexel(xi, yi);
int pixP2 = getTexel(xi+1, yi);
int pixP3 = getTexel(xi, yi+1);
int pixP4 = getTexel(xi+1, yi+1);

the method used above:

public final int getTexel(int x, int y){
	return imgData[((y & heightMask) << widthBits) + (x & widthMask)];
}

Try this version for me, and tell me if you’re still seeing a speed difference:


   int npp1 = ((yi<<widthBits)+xi)<<2;
   int npp2 = ((yi<<widthBits)+(xi+1))<<2;
   int npp3 = ((yi<<widthBits)+(xi+2))<<2;
   int npp4 = ((yi<<widthBits)+(xi+3))<<2;
   int pixP1 = np[npp1];
   int pixP2 = np[npp2];
   int pixP3 = np[npp3];
   int pixP4 = np[npp4];

I have a feeling that Hotspot is optimizing the starch out of your routine. As a result, the latter version is completely inlined. Once inlined, it then shows up as faster due to the CPU’s ability to execute out-of-order instructions. As long as you rely on a the same variable for every line, the CPU is required to execute the lines one after another.

Uh huh.

As usual, a microbenchmark fails to measure what it thinks it is.

Microbenchmarking a VM with a JIT and gettign meaningful results is very hard unless you really understand the deatisl of how the VM does its compilation.

Otherwise, the best benchmarks are real code performing complete functions over a reasonably long epriod of time.

What jbanes suggested does yeild an improvement. Thank you.

PS. this isnt a benchmark, it is real code being called in a real function millions of times a second. XD

You’re welcome. :slight_smile: