Are static non member methods faster?

[quote]As for the warmup - I have tried to do it, but making first two calls outside of measuring loop. It turned out to be not enough.

As for the commenting out printline, then it is not fair, because:

  1. This way you make tested rountine no-op - very bad mistake with modern compilers
    [/quote]
    Oh is THIS what you were trying to do.

Nice assumption. Totally wrong.

Here are the results for a real noop test.

Results:


Instance method 0ms
Static method 0ms
Instance method 0ms

Code:


package benchmarks;


public class AbiesNoopTest {

public static final int SIZE = 100000000;

  public int iadd(int a, int b) {
    return a + b;
  }

  public static int sadd(int a, int b) {
    return a+b;
  }

  public void itest() {
    /*
    int v = 0;
    for ( int i =0; i < SIZE; i++ ) {
      int a = iadd(i,1);
      v = iadd(v,a);
      v = iadd(v,i);
      v = iadd(v,5);
      v = iadd(v,v);
      v = iadd(i,v);
    }
    // return v;
//System.out.println(v);*/
  }

  public void stest() {
    /*
    int v = 0;
    for ( int i =0; i < SIZE; i++ ) {
      int a = sadd(i,1);
      v = sadd(v,a);
      v = sadd(v,i);
      v = sadd(v,5);
      v = sadd(v,v);
      v = sadd(i,v);
    }
    return v;
    //System.out.println(v);*/
  }



  public static void main(String[] argv) {
    AbiesNoopTest t = new AbiesNoopTest();
    int acc;
    t.itest();
    t.stest();
    long start = System.currentTimeMillis();
    for ( int i =0; i < 10; i++ ){
       t.itest();
    }
    System.out.println("Instance method " + (System.currentTimeMillis()-start) + "ms");
    start = System.currentTimeMillis();
    for ( int i =0; i < 10; i++ ){
      t.stest();
    }
    System.out.println("Static method " + (System.currentTimeMillis()-start) + "ms");
    start = System.currentTimeMillis();
    for ( int i =0; i < 10; i++ ){
      t.itest();
    }
    System.out.println("Instance method " + (System.currentTimeMillis()-start) + "ms");
  }
}

As you can see a REAL noop situation makes this execute almost immediately. I knew this instinctively from all the benchmarks I’ve run in the past, but this proves the point.

In any event a print is the WRONG way to solve that. If the compiler were truly smart enough to figure out that the loop did nothing (which would be quite a feat when you consider that its calling a sub function plus all the possabilities of side-effects) THEN what you should do is return the calculated value rather then void which would take the recognition of whether the value was used or not and mvoe it our of the scope of what you were testing.

NEVER NEVER NEVER do system IO in the midst of a test. I can’t make that point strongly enough.

Minor bug in “AbiesTest” which may be causing you to get values from a previous iteration (apologies).

Try this and see if you still get differing values on Windows…


package benchmarks;


public class AbiesTest {

public static final int SIZE = 100000000;

  public int iadd(int a, int b) {
    return a + b;
  }

  public static int sadd(int a, int b) {
    return a+b;
  }

  public void itest() {
    int v = 0;
    for ( int i =0; i < SIZE; i++ ) {
      int a = iadd(i,1);
      v = iadd(v,a);
      v = iadd(v,i);
      v = iadd(v,5);
      v = iadd(v,v);
      v = iadd(i,v);
    }
//System.out.println(v);
  }

  public void stest() {
    int v = 0;
    for ( int i =0; i < SIZE; i++ ) {
      int a = sadd(i,1);
      v = sadd(v,a);
      v = sadd(v,i);
      v = sadd(v,5);
      v = sadd(v,v);
      v = sadd(i,v);
    }
    //System.out.println(v);
  }



  public static void main(String[] argv) {
    AbiesTest t = new AbiesTest();
    t.itest();
    t.stest();
    long start = System.currentTimeMillis();
    for ( int i =0; i < 10; i++ ){
      t.itest();
    }
    System.out.println("Instance method " + (System.currentTimeMillis()-start) + "ms");
    start = System.currentTimeMillis();
    for ( int i =0; i < 10; i++ ){
      t.stest();
    }
    System.out.println("Static method " + (System.currentTimeMillis()-start) + "ms");
    start = System.currentTimeMillis();
    for ( int i =0; i < 10; i++ ){
      t.itest();
    }
    System.out.println("Instance method " + (System.currentTimeMillis()-start) + "ms");
  }
}


JK

Then Windows aint normal, my friend. We had BIG problems with this in the JDK performance team in benchmarks til we figured it out.

(I should add that we saw exactly the same thing in CD-I which
was OS/9, no relation to OSX. I think you will find that printing in general takes MUCH longer then you think it does.)

In the end, its an uncontrollable variable, which is BAD juju when Microbenchmarking (or benchmarking period.)

Rule #1: Make no assumptions
Rule #2: Test THOROUGHLY all the asusmptions you end up making anyway.

Remember, science is the act of proving yourself WRONG not right. If, after exhaustive work, you fail to prove your assumptions incorrect, then you may have something.

So in the spirit of testing assumptions, I decided to see what time it takes to println on OSX.

I have a feeling that OSX, being a truely multi-tasking OS, doesnt block waiting for the print. The end result is that you are right that on OSX returning from a print takes under 1ms in the situation i happen to be running in. But, as I said, its unpredictable since you are turning control over to the OS and I woudl hesitate to say that its under 1ms in all circumstances.

My past experiences with Windows NT was that it DOES block in the kernel and prints take a whole lot longer even in the best case, but I’ll check that out v. 2K in the lab tomorrow.

[quote]Minor bug in “AbiesTest” which may be causing you to get values from a previous iteration (apologies).
[/quote]
Lol ! I didn’t noticed the mistake and I was wondering why I got same results with the two tests. ;D

The new AbiesTest gives me :
Instance method 7080ms
Static method 3130ms
Instance method 7090ms

which is roughly equivalent to the previous one.

Also server JVM gives me 0ms too so abies is sort of right with that print thing : a statement that really uses the result is needed or the JVM may notice the operations are dumb and bypass them. Maybe we could return v, store it in an array and print this array at the end of the benchmark ? (shouldn’t take that much time).

[quote]Also server JVM gives me 0ms too so abies is sort of right with that print thing : a statement that really uses the result is needed or the JVM may notice the operations are dumb and bypass them.
[/quote]
Hmm. Then the optimizer is much smarter on Windows, which is odd since OSX’s Vnm is a varient of Hotspot.

I’ll give it a try on Win2k myself. If that happoens then all that should be needed to goose it is to return the value, I doubt you even need oto store it though you might need to assign it

JK

Okay, this fixes it for Win32 while adding no random elements.



package benchmarks;


public class AbiesTest {

public static final int SIZE = 100000000;

  public int iadd(int a, int b) {
    return a + b;
  }

  public static int sadd(int a, int b) {
    return a+b;
  }

  public int itest() {
    int v = 0;
    for ( int i =0; i < SIZE; i++ ) {
 int a = iadd(i,1);
 v = iadd(v,a);
 v = iadd(v,i);
 v = iadd(v,5);
 v = iadd(v,v);
 v = iadd(i,v);
    }
    return v;
//System.out.println(v);
  }

  public int stest() {
    int v = 0;
    for ( int i =0; i < SIZE; i++ ) {
 int a = sadd(i,1);
 v = sadd(v,a);
 v = sadd(v,i);
 v = sadd(v,5);
 v = sadd(v,v);
 v = sadd(i,v);
    }
    return v;
    //System.out.println(v);
  }



  public static void main(String[] argv) {
    AbiesTest t = new AbiesTest();
    t.itest();
    t.stest();
    long start = System.currentTimeMillis();
    for ( int i =0; i < 10; i++ ){
 t.itest();
    }
    System.out.println("Instance method " + (System.currentTimeMillis()-start) + "ms");
    start = System.currentTimeMillis();
    for ( int i =0; i < 10; i++ ){
 t.stest();
    }
    System.out.println("Static method " + (System.currentTimeMillis()-start) + "ms");
    start = System.currentTimeMillis();
    for ( int i =0; i < 10; i++ ){
 t.itest();
    }
    System.out.println("Instance method " + (System.currentTimeMillis()-start) + "ms");
  }
}


Okay, the proper code as above, on my Win2K box, jdk 1.4.1

Run in both cases with -Xcompile to “warm up” the VM.

(AMD Athalon 1.2)

Server results:


Instance method 4126ms

Static method 4146ms

Instance method 4206ms


Client Results:


Instance method 25757ms

Static method 6018ms

Instance method 25958ms

SO I can concurr that in Win32 Hotspot jdk1.4.1 in the server case there is no difference, but in the client case there is a significant one. (about a factor of 4)
This is NOT true under OSX.

This looks an awful lot like a client-compiler bug, but its possible that on Windows the client compiler is sacrificing agressive inling deliberately. I’ll need to go talk to the VM guys.

Yes, to be honest, if this were deliberate then I’d have to say that the client compiler doesn’t meet my definition of a “modern jit”.

Just out of curiosity, has anyoen tried this on IBM’s latest? maybe I will…

I did almost the same changes, except that I store v values in an int[10] and print them (outside the time measuring block).

With server JVM I got :
Instance method 770ms
Static method 830ms
Instance method 770ms

Definitely JVM/OS dependent. Well, unless there are evidences that some JVMs behave better with instance methods, I would say : use what fits best in terms of design, and if you have to choose between static and instance (eg : singleton vs static class discussion) go with static.

Could someone post results obtained with other JVMs/OSes, so that we have much more material ?

The other possability is that there is a -XX flag for agressive inling in the client VM. I’ll look into that too…

I’m DLing Sun 1.4.2 right now to test against.

Does anyoen know a way to get IBM’s Win32 1.3 WITHOUT donwloading the whole damn websphere development kit???

Client numbers for 1.4.2_02.

Less drastic by far, but still significant:


Instance method 12137ms
Static method 8843ms
Instance method 12127ms

I definitely need to ask the VM guys whats going on here. It looks kinda like some ‘work in progress’ under the hood of the client VM…

Just for completeness, here is server. No suprises here.
Its worth noting though that server in either config (which are the same) are half the best nukbers of client.

So it loosk to me like, if you are concerned about speed of calls, you probably want to avoid the client VM ANYWAY…


Instance method 4026ms
Static method 4026ms
Instance method 4016ms

Intresting link here for anyone interested in performance:

http://java.sun.com/j2se/1.4.2/1.4.2_whitepaper.html

I’m perusing it now for any clues to this client compierl behavior.

Hmm. This would suggest that inlining is suppsoed to be happening…

"The final phase does peephole optimization on the LIR and generates machine code from it. Emphasis is placed on extracting and preserving as much information as possible from the bytecodes. It focuses on local code quality and does very few global optimizations, since those are often the most expensive in terms of compile time. It supports inlining any function that has no exception handlers or synchronization, and also supports deoptimization for debugging and inlining. "

http://java.sun.com/products/hotspot/docs/whitepaper/Java_Hotspot_v1.4.1/Java_HSpot_WP_v1.4.1_1002_4.html#client

JUST to confuse things further…
I was a bit disturbed by the two levels of calls in the test as given because it compliacates the call chain. Again, in the interests of reducing variables, I simplified it to direct calls to the static and non-static functions.

For this test I still kept Abies multiple operations per loop though I’m a bit concerned about that complication (I’ll factor that out next set of tests.)

I also increased the numebr of tiems we run each test, just as a sanity check.

The results were interesting though not too out of the ordinary. I had the labels backwarsd (Instacne is static and static is instance) so don’t let that throw you.

The results:

Client:


java -Xcompile -client benchmarks.SimplifiedTest
Instance method (-705032721) 9734ms
Static method (-705032721) 11136ms
Instance method (-705032721) 9714ms
Static method (-705032721) 11136ms
Instance method (-705032721) 9704ms
Static method (-705032721) 11146ms

Server:


java -Xcompile -server benchmarks.SimplifiedTest
Instance method (-705032721) 4136ms
Static method (-705032721) 4076ms
Instance method (-705032721) 4096ms
Static method (-705032721) 4356ms
Instance method (-705032721) 4036ms
Static method (-705032721) 4347ms

here’s the test:



package benchmarks;


public class SimplifiedTest {

public static final int SIZE = 100000000;

  public int iadd(int a, int b) {
    return a + b;
  }

  public static int sadd(int a, int b) {
    return a+b;
  }




  public static void main(String[] argv) {
    SimplifiedTest t = new SimplifiedTest();
    for (int q = 0; q < 3; q++) {
      int v = 0;
      long start = System.currentTimeMillis();
      for (int i = 0; i < SIZE * 10; i++) {
        int a = t.iadd(i, 1);
        v = t.sadd(v, a);
        v = t.sadd(v, i);
        v = t.sadd(v, 5);
        v = t.sadd(v, v);
        v = t.sadd(i, v);
      }
      System.out.println("Instance method (" + v + ") " +
                         (System.currentTimeMillis() - start) + "ms");
      v = 0;
      start = System.currentTimeMillis();
      for (int i = 0; i < SIZE * 10; i++) {
        int a = t.iadd(i, 1);
        v = t.iadd(v, a);
        v = t.iadd(v, i);
        v = t.iadd(v, 5);
        v = t.iadd(v, v);
        v = t.iadd(i, v);
      }
      System.out.println("Static method (" + v + ") " +
                         (System.currentTimeMillis() - start) + "ms");
    }
  }
}


1.4.1_01, Athlon/win2000

Client:
Instance method 17875ms
Static method 4235ms
Instance method 17828ms

Server:
Instance method 2906ms
Static method 2906ms
Instance method 2906ms

Ibm jre 1.4.0:
Instance method 3516ms
Static method 3875ms
Instance method 3531ms

Well… yes… what can I say…

Interesting thing - if I add printouts to test, static test in ibm jre has same speed as instance test - 3890ms. Strange, very strange - I really doubt if printout wait is causing this, I rather suspect that most of these 10-20% differences are from different pairing of instructions generated rather the real difference in jit quality. I have removed printouts and added 3 ?add(i,v) instructions in both methods and results are the same (6300ms).

At the moment it seems to me that for ibm and Hotspot server, speed is the same, with any umpteen percent differences in any direction caused by random thing like pairing of instruction or cache line boundary. So I give up as far as trying to prove that static method is generally faster. I was proven wrong and I’m just left with statement that Client hotspot cannot manage to inline instance functions in reasonable way.

Tried -XX:CompileThreshold=?

Still waiting for that 2 stage compiling VM :wink: Our games depend on it!

Cas :slight_smile:

Okay,

I simplified it one step further and now Im getting BIZZARE results from the server compiler.

I definitely need to go talk to the VM guys. It seems very situation-dependant on what you get for instance v. static calls…

In an over-simplifed example now im getting a MASSIVE difference between the two for the server VM. But I’m not sure how real it is yet.

Be sure to use SimplifiedTest.sadd instead of t.sadd - t.sadd is not very ‘proper’ way and generated few extra bytecodes. And first iadd should be sadd.

SimplifiedTest on ibm jre still gives better results to instance method… (I have corrected printed names, so I’m sure it is it)

I think that I have found a bug in server jvm:

Static method (-1.0E8) 2578ms
Instance method (-1.0E8) 5968ms
Static method (-1.0E8) 6016ms
Instance method (-1.0E8) 5875ms
Static method (-1.0E8) 5969ms
Instance method (-1.0E8) 5890ms

This is after changing int to float, dividing number of iterations by 10 and changing last two lines to
v = t.iadd(v, -v);
v = t.iadd(-i, v);
to avoid Infinity values.

It seems that server Hotspot make same kind of wrong optimalization given enough time.

Just ran SimplifiedTest with message correction :wink:

  • client :
    Static method (-705032721) 3960ms
    Instance method (-705032721) 7250ms
    Static method (-705032721) 3790ms
    Instance method (-705032721) 7190ms
    Static method (-705032721) 3850ms
    Instance method (-705032721) 7250ms

  • server :
    Static method (-705032721) 830ms
    Instance method (-705032721) 2470ms
    Static method (-705032721) 1760ms
    Instance method (-705032721) 2360ms
    Static method (-705032721) 1540ms
    Instance method (-705032721) 2410ms