So I’ve had a question for a long time that I’ve just not yet found the answer to… or at least an answer that I’m %100 sure I’m interpreting correctly…
TL;DR at the bottom btw!
First off:
My latest framework for a software renderer I’m writing in Java uses a very fast sprite drawing system with efficient clipping, instantaneous orthogonal transformations and now fast integral alpha compositing built right into the draw function pretty seamlessly (I’m trying to waste as little cycles as I possibly can since it’s not hardware accelerated). The only thing is… I try not to repeat code as much as possible but when I first implemented the transform portion of the draw function I had to make some rather ugly looking exceptions in favor of speed (or perhaps more realistically, the illusion of speed? I don’t know, that’s what I’m here to find out). So basically (very basically) it’s set up like this at the moment:
public void drawSprite(blah... blah)
{
//precalculate clipping offsets
...
//interpret transforms
...
switch(transform)
{
case 1:
for(i)
//precalculate y index stuff blah blah
for(j)
//interesting stuff; actual blending and setting of pixels
break;
case 2:
for(i)
//precalculate y index stuff blah blah
for(j)
//interesting stuff; actual blending and setting of pixels
break;
case 3:
for(i)
//precalculate y index stuff blah blah
for(j)
//interesting stuff; actual blending and setting of pixels
break;
default:
for(i)
//precalculate y index stuff blah blah
for(j)
//interesting stuff; actual blending and setting of pixels
break;
}//so basically the same exact stuff in each case but some expressions are
//A) changed around and have different operations done on them and
//B) results correlate DIRECTLY to the loop variables so a pre-loop
//switch won't work
}
You can imagine how ugly it is every time I add functionality to this method because I have to add it 4 times. The only other way I could do it is if I put the switch inside the outer loop, then I’d only need to write the loops once and add to them whatever I need. The thing is that I’m afraid it will affect performance later. I’ve ran some tests and it doesn’t seem to have too big of a footprint with 64 24x24 sprites onscreen simultaneously, however I don’t know to what extent performance may degrade when there are bigger sprites and more of them as a result of those few extra couple thousand or so instructions.
TL;DR:
SO! My MAIN question to you, the Java community:
Do you think it is worth it to just stick with my current format and rewrite everything multiple times just to save on a few unnecessary cycles?
OR do you think that the technology I have at my disposal is intelligent enough to alleviate such an obvious bottleneck through hardware magic such as branch prediction? Specifically, wouldn’t this be an ideal situation in which the branch predictive properties of our modern CPUs would be able to assist? The branch predictor can detect long sequences of similar conditional results, right? Is my understanding of how that works even accurate? Any help to point me in the right direction is appreciated, thank you!