Will be escape analysis based optimizations integrated into Mustang?

Hi,

As far as I know it would be possible with escape analysis based optimizations to do at least 2 closed-world optimizations for the price of a very high analysis overhead (=expensive):
1.) Lock elimination (Especially on multiprocessor servers interresting)
2.) Stack allocation

Will escape analysis based optimizations be part of Mustang, because as far as I see no work is done at all on this task although several groups argue that this could be helpful (an enginieer stated that mustang has support for lock-widening but not lock-elimination because that would require escape analysis, gc engineer stated that some where in java’s future short lived objects may be even faster but …).
Or will we have to wait till Dolphin or even longer?

lg Clemens

It was said it will be implemented in Mustang, but like most good things it’s probably not going to be implemented in the upcoming version of Java.

I’d put my money on Dolphin.

Seems that something is boiling under the cover:

Excerpted from http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6325352

call record_for_igvn(phi) only when escape analysis is enabled.

Excerpted from http://www-128.ibm.com/developerworks/java/library/j-jtp09275.html

Escape analysis is an optimization that has been talked about for a long time, and it is finally here – the current builds of Mustang (Java SE 6) can do escape analysis and convert heap allocation to stack allocation (or no allocation) where appropriate. The use of escape analysis to eliminate some allocations results in even faster average allocation times, reduced memory footprint, and fewer cache misses. Further, optimizing away some allocations reduces pressure on the garbage collector and allows collection to run less often.

OH MY GOODNESS THANK YOU SUN!!!
The wait is finally over. :smiley:

I am excited over this. :slight_smile:

Well the ibm-article is definitivly wrong since I already knew it and I also know that escape-analysis based stack allocation is not enabled in current mustang builds. It is/was(?) planned for Mustang thats the reason I guess why the autor mentioned mustang.

lg Clemens

posted this topic into IBM developerWorks > Java technology > Forums > Java theory and practice forum

"I have read last column of Java theory and practice: “Urban performance legends, revisited”.
I have been playing with all recent builds of mustang (java 6), but there is no evidence that escape analysis is implemented or enabled.

Best Regards
ARM"

And Brian Goetz replies

“What evidence would you expect to find? The operation is invisible to the program. Also, optimizations like this are dynamic and may not kick in until a program runs for a long time.
Also, it is often common that advanced optimizations are not all “turned on” by default for pre-release JVMs.”

See http://www-128.ibm.com/developerworks/forums/dw_thread.jsp?message=13757590&cat=10&thread=96130&treeDisplayType=threadmode1&forum=181#13757590

I wonder from where this guy knows wether escape analysis is already integrated in Mustang - since he’s even not a sun engineer.
However mustang code is free (for somebody who does not plan to work on a free jvm project) so it should be possible to find out wether its already implemented.

Well, there are jvm analysis tools and if mustang would do stack-allocation (which it currently doesn’t) the heap would not grow at all and there would be at least a 25-??% performance gain for tons of small allocations.

This is really my hope please
However they want to test it, don’t they.

lg Clemens

java -server -XX:+DoEscapeAnalysis

Enjoy :slight_smile:

I don’t even see any difference on something like

int sum = 0;
for ( int i =0; i < 10000000; i++ ) {
sum += new Integer(i%2).intValue();
}

I put it into method and called it 10 times to give hotspot chance to do optimalization.

Wow that means at least they are activly working on it wow - Thanks SUN!
Well, maybe they’ve just implemented the analysis stuff but not stack-allocation or lock-removeal?

However thanks a lot for let me know that switch, really interresting!

lg Clemens

If you download debug binaries, also try -XX:+PrintEscapeAnalysis

I’m decoding the output now…

For program


public class Test {
  public static void main(String[] argv) {
    for ( int i =0; i < 10; i++ ) {
      test();
    }
  }

  public static void test() {
    int sum = 0;
    for ( int i = 0; i < 1000000; i++ ) {
      sum += new Integer(i%2).intValue();
    }
  }
}

Output is as follows


  1       java.lang.Integer::<init> (10 bytes)
  2       java.lang.Number::<init> (5 bytes)
  1%      Test::test @ 4 (33 bytes)
======== Connection graph for  Test::test
  60  JavaObject  NoEscape       [[ 129F]]   60	Allocate	===  46  47  48  8  1 ( 53  54 _  57  59  54  1  1  50  49 ) [[ 61  62  63  70  71  72 ]]  rawptr:NotNull ( int+, java/lang/Object:NotNull *, rawptr:NotNull, rawptr:NotNull, rawptr:NotNull, rawptr:NotNull, rawptr:NotNull ) Test::test @ bci:11 
  72  LocalVar  NoEscape       [[ 60P]]   72	Proj	===  60  [[ 73 ]] #5
  3       Test::test (33 bytes)
======== Connection graph for  Test::test
  45  JavaObject  NoEscape       [[ 113F]]   45	Allocate	===  31  32  33  8  1 ( 38  39 _  42  44  39  1  1  35  34 ) [[ 46  47  48  55  56  57 ]]  rawptr:NotNull ( int+, java/lang/Object:NotNull *, rawptr:NotNull, rawptr:NotNull, rawptr:NotNull, rawptr:NotNull, rawptr:NotNull ) Test::test @ bci:11 
  57  LocalVar  NoEscape       [[ 45P]]   57	Proj	===  45  [[ 58 ]] #5 

Edit:
After playing with
java -server -XX:+PrintOptoAssembly -XX:+PrintIdeal -XX:+DoEscapeAnalysis -XX:+PrintEscapeAnalysis

it seems that there is no difference in generated code - but EscapeAnalysis correctly detects that Integer allocation is local. Most probably you are right - they have implemented analysis, but part which performs stack allocation is missing.

I wonder if monitors synchronization is skipped for method local objects - will have to investigate

Hats off to Hotspot team. I’m playing with various options, looking at compiler output. Just noticed that String.hashCode has it’s loop unrolled (four steps). While it is not most complicated method in the world, fact that Hotspot unrolls loops for you is a nice thing. And to be honest, it unrolls it in good way - fetch all data from memory at one place and then performs 4 steps on registers only (as opposed to more obvious fetch/compute/fetch/compute/etc).

Lock elimination is now present in Mustand build 63. It still needs -server -XX:+DoEscapeAnalysis.

Edit: Here’s an interesting article on Mustang’s synchronization optimizations.

I was just on my way to this forum to make the same post ;D

any performance numbers for “typical” server-side code and games?

anything new yet?

yep I found a statement where a hotspot engineer stated that stack-based allocation won’t make it into mustang :frowning:

i’ll stay a jrockit fanboy then.

Hey, it’s only going to eke a few tiny % performance, in certain situations only. No big deal.

Cas :slight_smile:

But that are the situation where developers normally use ObjectPools or other strange solutions for their problems.
For server-side programs this would really help I think…

lg Clemens

No, you use object pools for precisely the other way round - they are for objects that are expensive to construct.

Cas :slight_smile: