Are static non member methods faster?

With the “return 0” change, I get

  • client JVM : instance=1760ms ; static=1260ms
  • server JVM : instance=0ms ; static=0ms

And now, a little bit more interesting…
When I change nested iadd/sadd calls with only 1 call (v = iadd(i,1); for example), I get no difference between static and instance :

  • client JVM : instance=1650ms ; static=1650ms
  • server JVM : instance=0ms ; static=0ms

So on my JVM version performance difference is either due to int overflow or to nested calls.

Yes, and I had obtained such similar results too with
i/sadd(1,2); So, would this breakdown be any reasonable ?
Overhead simply due to declaring a function instance/static -
looks like there is none. Any extra optimization that the
JVM would make use of if the function does a little more
non-trivial work. I don’t want to get into areas where I’m not an
expert :slight_smile:

Edit: Of course, hard to quantify what would happen in a real
full fledged application when the JVM would be highly loaded - standard disclaimer !

Less nested calls you make, less difference between these two you will see. Overhead of loop is quite big, so you need at least few calls to measure a difference.

As far as nesting is concerned, please try following code


public void itest() {
    int v = 0;
    for ( int i =0; i < SIZE; i++ ) {
      int a = iadd(i,1);
      v = iadd(v,a);
      v = iadd(v,i);
      v = iadd(v,5);
      v = iadd(v,v);
      v = iadd(i,v);
    }
    System.out.println(v);
  }

  public void stest() {
    int v = 0;
    for ( int i =0; i < SIZE; i++ ) {
      int a = sadd(i,1);
      v = sadd(v,a);
      v = sadd(v,i);
      v = sadd(v,5);
      v = sadd(v,v);
      v = sadd(i,v);
    }
    System.out.println(v);
  }

It gives 19828ms for instance, 5750ms for static (this is not comparable to my previous results, as I use different computer now).

4x difference is even better to show the problem, so maybe we can start to talk about ‘corrected’ benchmark now :slight_smile:

I have done printout on purpose - without it, method is too trivial to optimize. Even with printout, it is possible to compute statically, but I hope that thanks to the overflow it is not something that normal compiler will try to do.

This benchmark is really trivial, to show that even in simpliest cases Hotspot shows difference between such calls. With more complicated ones, it can get only worse.

As for making loop shorter, please be sure to use both v and i inside it. If you don’t modify v, it is possible to just run last iteration of loop. If you don’t use i, all iterations give same result. So v = iadd(v,i) is absolutely bare minimum for anything reasonable - but you need to keep adding instruction long enough to server Hotspot stop optimizing it completly (we are not testing Hotspot in general, we are testing difference between static and instance method invocation speed).

Looks like I’ve a lot to learn in performance tuning and How To Write Correct Benchmarks ;D

With those new methods, I got the same results as in my first post :

  • client JVM : instance= 8680ms ; static=2520ms
  • server JVM : instance= 980ms ; static=940ms

???

One more point for your second version (P4 1.6GHz,
j2sdk1.4.2beta1):

instance: 10380ms; static: 5330ms

Keep at it !

Now I get the following (much varying) results:

1.4.1_01 client:
Instance method 28311ms
Static method 12287ms

1.4.1_01 server:
Instance method 4316ms
Static method 4286ms

GCJ compiled .exe:
Instance method 133893ms
Static method 123427ms
(oh my god ::slight_smile: Must be a bug)

With the latest routines posted above on OS X 10.3 Java 1.4.1_01

Instance method 13527ms
Static method 12005ms

about 12% difference.

CAVEAT REMINDER: -XX flags are non standard, non-published flags mostly intended for the Sun development guys and subject to chnage or elimination without notice.

Having said that, here’s your answer:

The format is undocumented and unsupported (-XX flag).
It will change as more/different properties are tracked, and please
don’t ask why ‘*’ is used for a native method instead of ‘n’.

compile ID, printed using at least 3 columns

% compile for On Stack Replacement, OSR, of interpreter frame

  •                             method is native
    
s                                method is synchronized
 !                               method has exception handler
  b                              interpreter has been blocked until compile completes
   1                             compile without full optimization, tier1
      method_name..              without signature
                    @ bci %d     For OSR compiles, bytecode index at which replacement will happen
                    (%d bytes)   Bytes of bytecodes in method (does not include inlined methods)

[quote] :o The gap is even bigger for me : I got 8900ms with instance method vs 2530ms with static method !!

I’m running Java Hotspot client VM ( 1.4.2-b28 ) under Win98 on a P4 2.8C.

So what is the explanation now ;D ?
[/quote]
Well to begin with, I see NO VM warmup. A potentially fatal flaw in ANY Vm benchmark.

Are you running this with the -compile flag?

Jeff

Okay,

I did 2 things to fix this benchmark (code below).

(1) I removed the prints in the middle of the tests. No no no bad bad bad. IO blocks in the system and will dominate your results and destroy their meaning.

(2) I put ANOTHER “non-static” test AFTER the static test. This illustrates exactly what I thought was going on, your test was order dependant because you weren’t fully warming up the VM prior to starting your tests.

Results on my mac show these calls ARE identical, within a pretty meaningless margin of error:

results:


Instance method 10366ms
Static method 9154ms
Instance method 9142ms

Gee, based on your benchmark static methods are SLOWER!

(Just kidding, its obviously noise or the tail-end of the wram-up issue effecting the second test.)

Code:


package benchmarks;

public class Test {

public static final int SIZE = 100000000;

  public int iadd(int a, int b) {
    return a + b;
  }

  public static int sadd(int a, int b) {
    return a+b;
  }

  public void itest() {
    int v = 0;
    for ( int i =0; i < SIZE; i++ ) {
      iadd(i,iadd(v,iadd(5,iadd(iadd(v,iadd(i,1)),i))));
    }
    //System.out.println(v);
  }

  public void stest() {
    int v = 0;
    for ( int i =0; i < SIZE; i++ ) {
      v = sadd(i,sadd(v,sadd(5,sadd(sadd(v,sadd(i,1)),i))));
    }
    //System.out.println(v);
  }

  public static void main(String[] argv) {
    Test t = new Test();
    t.itest();
    t.stest();
    long start = System.currentTimeMillis();
    for ( int i =0; i < 10; i++ ){
      t.itest();
    }
    System.out.println("Instance method " + (System.currentTimeMillis()-start) + "ms");
    start = System.currentTimeMillis();
    for ( int i =0; i < 10; i++ ){
      t.stest();
    }
    System.out.println("Static method " + (System.currentTimeMillis()-start) + "ms");
    start = System.currentTimeMillis();
    for ( int i =0; i < 10; i++ ){
      t.itest();
    }
    System.out.println("Instance method " + (System.currentTimeMillis()-start) + "ms");
  }
}


next?

JK

Oh, and here is Abies’ test with the same fixes. Same results.

I warned you guys, VM benchmarking is NOT simple.

Results:


Instance method 9142ms
Static method 9274ms
Instance method 9153ms

Code:


public class AbiesTest {

public static final int SIZE = 100000000;

  public int iadd(int a, int b) {
    return a + b;
  }

  public static int sadd(int a, int b) {
    return a+b;
  }

  public void itest() {
    int v = 0;
    for ( int i =0; i < SIZE; i++ ) {
      int a = iadd(i,1);
      v = iadd(v,a);
      v = iadd(v,i);
      v = iadd(v,5);
      v = iadd(v,v);
      v = iadd(i,v);
    }
//System.out.println(v);
  }

  public void stest() {
    int v = 0;
    for ( int i =0; i < SIZE; i++ ) {
      int a = sadd(i,1);
      v = sadd(v,a);
      v = sadd(v,i);
      v = sadd(v,5);
      v = sadd(v,v);
      v = sadd(i,v);
    }
    //System.out.println(v);
  }



  public static void main(String[] argv) {
    Test t = new Test();
    t.itest();
    t.stest();
    long start = System.currentTimeMillis();
    for ( int i =0; i < 10; i++ ){
      t.itest();
    }
    System.out.println("Instance method " + (System.currentTimeMillis()-start) + "ms");
    start = System.currentTimeMillis();
    for ( int i =0; i < 10; i++ ){
      t.stest();
    }
    System.out.println("Static method " + (System.currentTimeMillis()-start) + "ms");
    start = System.currentTimeMillis();
    for ( int i =0; i < 10; i++ ){
      t.itest();
    }
    System.out.println("Instance method " + (System.currentTimeMillis()-start) + "ms");
  }
}


Now the problem with the original logic this whoel silliness was started by is this:

(1) Statement: It takes more CPU to set up the non-static monomoprhic calss then static ones.

Answer: Not really true in hotspot. All call sites are initially assumed to be monomorphic. They become un-monomoprhic due to class loading (see below).

(2) Statement: it takes CPU to watch for it becomign non-monomorphic.

Answer: Again, not really true in Hotspot. Hotspot detects a call site potentially becoming non mono-morphic by watching the class loads and seeing if a newly loaded class potentially overr-rides a method thats currently being called in a mon omorphic fashion. There are pretty tricky data structures inside of HS to track all this.

This is part of classloading for any class, and is a one-time cost per class load which makes it pretty insignificant at run-time.

(3) new Statement: static calls take less memory.
This MIGHT be true, but very much dependant on what your Vm does and how much it tries to optimize memory usage.
In practice i don’t think it singificantly effects foot-print which is actually pretty dominated today by reflection information.

Can we put this to bed now? The answer to the original question, in any modern VM, is an unambigous NO!

Thanks Jeff, one more question, just 'cause I’m curious…

[quote] b interpreter has been blocked until compile completes
[/quote]
I’ve never ever seen a line that doesn’t have a ‘b’ (only tried client VM). Is the interpreter always blocked when compiling? I’ve tried on a dual CPU machine as well and I still always get the ‘b’. Maybe this is only ever different with the server VM?

Dunno, I’ll ask Mike-- my answer man on the VM team – and get back to you :slight_smile:

Jeff : never heard of the -compile flag and it is not recognized by java and javac on my system

java -client Test :
Instance method 6430ms
Static method 2360ms
Instance method 6420ms

java -client AbiesTest :
Instance method 6310ms
Static method 2370ms
Instance method 6200ms

And I don’t think it is silliness. IF there is a SIGNIFICANT difference, then it does make sense to use static methods rather than instance methods when it is possible. :slight_smile:

As for the warmup - I have tried to do it, but making first two calls outside of measuring loop. It turned out to be not enough.

As for the commenting out printline, then it is not fair, because:

  1. This way you make tested rountine no-op - very bad mistake with modern compilers
  2. I do not agree that printing out to stdout will dominate benchmark - we are talking about 10 lines per few seconds, I’m not testing single ms differences
  3. Printline is the same in both methods, so even with overhead, it should have same effect.

Even with your correction, I got:

Instance method 6594ms
Static method 5516ms
Instance method 6609ms
Static method 5500ms

For client Hotspot.

So as far as your (1) Statement, it is not true for client Hotspot for many computers. In some cases very wrong, in some cases slightly wrong, but always wrong.

As for the server jvm, at least my benchmark was showing some numbers (showing that both cases are same as fast), in your case I got only 0ms from top to down… indeed writing benchmark is a tricky business, this printout was there for a reason…

One more about printout overhead - you can change SIZE to 0 to check it, it is 0 ms (which means less than 10ms, which in turn means it does not affect this benchmark, as marigin of error is anyway around 50ms).

As for the statement (2), I agree - it is mostly one-time cost, only during classloading and even taking into account that classloading is a worst slowdown during java startup, I doubt if it makes any difference.

As for statement (3) I’m not sure if I have said something to this effect, if yes, I can take it back - it was not my point. I’m talking only about time performance.

It seems that client Hotspot is not ‘any modern VM’ :wink:

Can we agree on statement, that static calls are faster in client jvm and do not make difference in server jvm ? It can be clearly seen from everybody results except yours, so I think it affect enough people out there…

[quote]Jeff : never heard of the -compile flag and it is not recognized by java and javac on my system

java -client Test :
Instance method 6430ms
Static method 2360ms
Instance method 6420ms

java -client AbiesTest :
Instance method 6310ms
Static method 2370ms
Instance method 6200ms

And I don’t think it is silliness. IF there is a SIGNIFICANT difference, then it does make sense to use static methods rather than instance methods when it is possible. :slight_smile:
[/quote]
sorry -Xcompile

it forces compilation of all methods immediately thereby removing some of the effect of the warmup. But its best just
to properly warm the VM with a singiificantly long test run before you run your test.

I find it very odd that you are getting such divergent results from the same code I’m running. What is your platform and VM? I’ll see if I can macth it in the lab.

In the end this is EXACTLY the problem with micro-benchmarks though. As they do not behave like real code they are prone to getting hung up on, and over-reacting to, some inner complexity of the VM process. Which is what I suspect is happenign to you here.

JK

[quote]Can we agree on statement, that static calls are faster in client jvm and do not make difference in server jvm ? It can be clearly seen from everybody results except yours, so I think it affect enough people out there…
[/quote]
I don’t follow your comment on the prints, maybe you could explain further. The fzct of the matter is that a print (or any system io) within a test is effectively a sleep(random()) and will screw your results unless your test last so long that this is in the noise (for “so long”, figure hours). If you explain better what you were trying to do maybe I can suggest a solution that wont mess up your readings.

In re client/server. On MacOSX I had no perceivable difference between -client and -server. It would be good for whoever else had reported OSX numbers to check my results to see if they line up. As I say, taken literally, my numbers show that static is (nominally) SLOWER on MacOSX.

Meanwhile I’ll run both -client and -server on a Win2K box in the lab tomorrow. It wouldl help to get stats on these test machines. It would actually be a more prope and reliable test if, in addition, we broke it into two tests, one for each case, such that the ONLY difference between the situations is what we are trying to test. When trying to really microbenchmark, control of variables is pretty critical.

JVM : Java Hotspot client VM ( 1.4.2-b28 )
OS : Win98 ver 4.10.2222
Hardware : P4 2.8C with 1024 Mo RAM

[quote] In the end this is EXACTLY the problem with micro-benchmarks though. As they do not behave like real code they are prone to getting hung up on, and over-reacting to, some inner complexity of the VM process. Which is what I suspect is happenign to you here.
[/quote]
Agreed, differences found here might not reflect reality. And chances are high that it is JVM/OS dependent which would explain why you got different results.

I agree that printout introduces random wait to benchmark. It indeed equals to sleep(random()) - but this random() is quite small. On normal machine it should be less than ms per printout. So we can add error +/- 1ms per printout to error of benchmark. With seconds-long benchmark, few printouts do not change the results, especially if you run benchmark few times.

Why printouts at all ? To make sure that result of method is actually needed. Without printout, 1.4.2 server on my home computer optimize this code to no-op, because it knows that all this computation is never used. With printout, Hotspot cannot ‘cheat’, because it HAS to print correct value to screen. I suppose that with enough levels of indirection, you can make Hotspot into believing that you need this value anyway - but there is no guarantee that you will succeed. On the other hand, printouts are fool-proof - only way to spoil them is precomputing result of entire function, which is hardly doable by jit with so long loop.