micro optimizations

By the way, what is the op for the ‘compare to 0’
of x86?

cmp OPERAND,0
jz LABEL

cmp is the compare instruction

jz is “Jump if zero”. There is also jnz for “Jump if not zero”, etc. Both of these check the zero flag. Other jump instructions check different flags like carry, etc.

OPERAND can be a register or memory.

LABEL is just where you want to jump to.

Correct, but (as i mentioned above) this is not an optimized way of doing it in a loop, because a compare to 0 isn’t different from a compare to X in this example. The actual optimization is to get rid of the “cmp” in the loop.

I may be looking at this the wrong way, but are you actually checking for zero in this loop? It would seem that you are looking for i to be less than 0 which is a different check. “i >= 0” is much different than “i == 0” or “i != 0”.

The loop has to stop when i gets lower than 0. If we were testing for equality, the loop with 0 value will never happen.

That’s my point. You were talking about optimizing for “i == 0” when 0 needs to be included in the indexing. If the whole optimization is about “i == 0” then the whole way the code is written would need to be changed. Unless you were just being brief when talking about chips being able to test for 0 quickly. “i >= 0” doesn’t test for that equality, rather it’s a different test entirely. If I’m remembering my Assembly correctly (I only worked with it for a few months), then the test “i >= 0” results in a call to “GE” and a subsequent conditional jump rather than the clean “JZ”. To halt execution of the loop when the counter equals 0 would make something odd looking like:


int a[] = new int[1000000]; 
for (int i = a.length; i != 0; i--)
{ 
// Do stuff (While remembering that you
// have to subtract one to index the array)
} 

Now I’m either off of my nut or it would be very hard to implement this optimization even in a language where you’d be guarenteed that the resulting compile would use the specific chip optimization.

Oh, i see.
Look at answer number 16 in this thread. Both operations can be done the same at assembly level.
In fact, the topic seemed to have quickly slippped away from “i==0” tests to 'i and special cases (0 or -1)" tests.

chrisbliss18: The optimization is based on “jump if greater than or equal to zero” described at:
http://mrl.nyu.edu/~meyer/jvmref/ref-_ifge.html

On http://mrl.nyu.edu/~meyer/jvmref/ check the following for java opt codes that are somehow related to a boolean test where one part of the test is zero.

[]ifeq
[
]ifge
[]ifgt
[
]ifle
[]iflt
[
]ifne

leknor: Okay… I’ve got you now. I got hung up on the whole “JZ” and “JNZ” thing. I’ll look at those references. I’m new here and you seem to know your stuff. I’ll have to watch what you put up.

chrisbliss18: no problem dude. I almost couldn’t find those urls and I was about to get unsure that my facts were straight. It’s good for me to go back to my sources every once and a while to make sure I’m not dreaming up things that aren’t there.

Off topic, but isn’t prefix (++i) faster than postfix (i++)?

I’m sorry to pout cold water on your efforts, but for improving performance this is nearly all pointless (it’s rather useful for other things, like certain one-off cases where the optimization wouldn’'t be “safe” but you have some special knowledge, like perhaps knowing that the program always throws an exception at some point before hitting i=0, and can do something about it). This isn’t an opinion, its fact, and it’s all to do with compilers.

One of the major reasons for using a compiled language is so that the compiler will do precisely the kind of trick you suggest here for you, and you can carry on writing in high-level code. In the bad old days when compilers still sucked (performance wise) it was worth doing such low-level optimizations, trading on the fact that a really good human programmer was better at optimization than a typical compiler.

Nowadays, you will find in general that such optimizations have more of a negative effect (when you try them across all main platforms) than a positive one - this is because you are not only second-guessing the compiler, and assuming that it wouldn’t have made the same optimization itself, but you are reducing the number of additional optimizations it could have been doing, but which are way too clever for you to ever understand (unless you are a first-rate maths PhD or above).

With java especially, its not even possible to make statements like “this is one less instruction”. You can make that kind of statement with C family languages, where you have a much better knowledge of what’s going to happen to your code before it gets executed (compiled straight to assembler, for instance). The fundamental way that a Java VM works is to compile java byte code into code optimized for the machine its running on. As someone suggested (I can’t remember if this is correct, but it sounds pretty likely given the 486 arch), this “optimization” might work best on a 386/486 class processor. But it would work slower than the “unoptimized” version on e.g. any processor family that has instructions for doing for loops with increasing loop-variables. (the java VM for that machine would have auto-recompiled the byte code to use the native CPU instruction, but now it can’t).

It’s better in general (unless you have a poor compiler) not to try this kind of clever optimization. When you factor in the additional loss to the programmer of the code now being less clear (unless you are willing to put 3 lines of comments explaining the bizarre implementation in front of each and every for loop!), the weight of evidence is in favour of avoiding this.

Even things like the good idea to reduce the number of times the “length” of an array is referenced are actually a complete waste of time. Every compiler today uses “live variable analysis” and “reachability” (feel free to google for them - should be slides on them in any really good Computer Science undergrad course) to automatically remove every single unecessary calculation. Although sometimes (as part of other optimizations) it can actually be better performance to deliberately compute something twice (and thats the kind of complex decision that the compiler will make for you - if you only give it the chance!).

Finally, its worth remembering that something which is well-suited to java byte code is often slower on most real machines than something that isn’t. Java byte code was deliberately designed to be a really rubbish processor architecture for one of two reasons (depending on who you talk to in Sun):

  1. They wanted it to be extremely easy to create a “hardware Java chip” that could execute java byte code. MAJC (the java chip) was much cheaper and easier for them to design and manufacture using 15-yr old techniques in processor design.

  2. Java 1.0 came out when machines were a heck of a lot slower than today (I still have somewhere a copy of a Java VM for DOS!). Making java byte code a shitty inefficient architecture made it very easy for lots of different manufacturers and developers to write the first VM for each of the different platforms. Sun were rather worried that they might not achieve the critical mass of Java VM’s for enough platforms (and it took quite a few years to happen, even with the very easy to implement java 1.0 VM’s).

[quote]I’m sorry to pout cold water on your efforts, but for improving performance this is nearly all pointless
[/quote]
Yes, that’s what came out after the first bench in the beginning of the thread. Nevertheless, some things came out of the tests, and while pointless, it was useful. At least, i think so.

[quote]Even things like the good idea to reduce the number of times the “length” of an array is referenced are actually a complete waste of time. Every compiler today uses “live variable analysis” and “reachability” (feel free to google for them - should be slides on them in any really good Computer Science undergrad course) to automatically remove every single unecessary calculation.
[/quote]
For array.length this is true… but for String.length(), where you have a method invocation that possibly has side effects, it is likely a different story.

I agree in principal that optimizations should be left to the compiler, particularily for Java where the ‘final’ compiler is often not under your control. But there are still some low-level optimizations that are still useful.