Hi all,
I was just toying around with trying to optimize some things in my emulator and found out something which apparently has quite some impact on performance.
Somewhere in the rendering in an inner loop, I do this:
vdp.pixels[p++] = ((b & 0x80) == 0x80) ? fc : bc;
vdp.pixels[p++] = ((b & 0x40) == 0x40) ? fc : bc;
vdp.pixels[p++] = ((b & 0x20) == 0x20) ? fc : bc;
vdp.pixels[p++] = ((b & 0x10) == 0x10) ? fc : bc;
vdp.pixels[p++] = ((b & 0x08) == 0x08) ? fc : bc;
vdp.pixels[p++] = ((b & 0x04) == 0x04) ? fc : bc;
vdp.pixels[p++] = ((b & 0x02) == 0x02) ? fc : bc;
vdp.pixels[p] = ((b & 0x01) == 0x01) ? fc : bc;
This renders one line (8 pixels) of a character and tests per bit if the pixel should be rendered in foreground color or background color.
Then, as a test, I tried to change it to the following code (guessing that somehow comparing to 0 might be slightly faster):
vdp.pixels[p++] = ((b & 0x80) != 0) ? fc : bc;
vdp.pixels[p++] = ((b & 0x40) != 0) ? fc : bc;
vdp.pixels[p++] = ((b & 0x20) != 0) ? fc : bc;
vdp.pixels[p++] = ((b & 0x10) != 0) ? fc : bc;
vdp.pixels[p++] = ((b & 0x08) != 0) ? fc : bc;
vdp.pixels[p++] = ((b & 0x04) != 0) ? fc : bc;
vdp.pixels[p++] = ((b & 0x02) != 0) ? fc : bc;
vdp.pixels[p] = ((b & 0x01) != 0) ? fc : bc;
I didn’t really expect much of change (if any), but this code made the whole emulator a whopping 8% slower! Considering there’s much more going on in the emulator, and that this code wasn’t even the main bottleneck, this performance degradation struck me as quite extreme.
Furthermore, I think this little change in code should (in a perfect world) not have made any difference in performance at all because imho hotspot should be able to compile both code to something optimal.
What do you think, should I create a (quite uncritical ;)) bug report, or is there a good explanation for this?
BTW, I tested this on the java6 server VM.
-edit-: the change in code made it not 8% slower but almost 10%.