Escape Analysis?

ewjordan · March 4, 2008, 2:22am

I’m a bit confused. From what I remember, escape analysis was supposed to automagically solve many of our temporary small object woes (as per http://www.ibm.com/developerworks/java/library/j-jtp09275.html) in the Java 6 release. But I’m not seeing the improvements. I’m working on JBox2d (a physics engine: http://www.jbox2d.org), and we are overwhelmingly bottlenecked by Vec2 creation costs, which was one of the things I was under the impression escape analysis would fix, as these objects are extremely short lived, mainly just used for shuffling pairs of numbers from place to place. Frankly, things seem to be performing just about the same as in 1.5. There still seems to be almost a 2:1 performance difference when I fully inline temp vector creations in realistic benchmarks.

What’s going on here? Was escape analysis cut from the JVM, or am I misinterpreting what it should be doing for me? I’m quite happy to inline all that stuff by hand if need be, or reuse a few static vectors, but that IBM article seemed to strongly advise against that type of stuff (the author all but implied that anyone that would consider it is a freaking idiot).

Matzon · March 4, 2008, 6:37am

afaik, its not been default activated yet

CommanderKeith · March 4, 2008, 7:18am

Apparently it’s in, but it looks like you need to add some VM arguments (and maybe it’s only for the server VM?):

Riven did some good benchmarks showing the problems:
http://www.java-gaming.org/forums/index.php?topic=14940.0
He tested on java 6 (Mustang), but didn’t use the VM args given in the above link (I think…)

By the way, it’s great that you’re working on optimisations to JBox2D

princec · March 4, 2008, 1:17pm

That guy should be really more careful about using silly abbreviated variable names…

Cas

bienator · March 4, 2008, 5:31pm

interesting. (even if its not used to speed up object deallocation)
I recently asked in the comments of this blog enty if the server compiler would provide enough information to recognise “cheap garbage” and got as answer that this technique [escape analysis] won’t have any performance gain with the current memory model (read the comments).

It seems even sun VM devs are not up to date with the -XX: flags :-\

Riven · March 4, 2008, 7:14pm

It was mostly autogenerated code.

ewjordan · March 5, 2008, 1:14am

Thanks, guys - going through some of those links, I came across http://developers.sun.com/learning/javaoneonline/2006/coreplatform/TS-3412.pdf, where I found the following quick summary:

[quote]Implementation Status

Java SE 6 has escape analysis and lock elision in the server compiler.
It is off by default, it can be enabled with the -XX:+UseEscapeAnalysis flag.
Java SE 7 will have further optimizations.
There are currently no plans to release a client compiler with escape analysis.
[/quote]
That last one is a shame - that means that as advanced and helpful as these optimizations are, they will be absolutely useless as far as game programming goes, right?

princec · March 5, 2008, 11:51am

Well, unless you ship with the server VM, and then suffer horrible startup.

But… where does the tiered compiler fit in?

Cas

CommanderKeith · March 5, 2008, 12:38pm

That was a great set of slides, thanks ewjordan.

Cas: the slides have info about escape analysis. Also Clemens (LinuxHippy) posted this, apparently it’s in java 7: http://www.java-gaming.org/forums/index.php?topic=16653.0

princec · March 5, 2008, 3:12pm

Yeah I know - but, if they are not adding EA to the client VM… but they’re axing the separate client and server VMs… what’s actually going on here in the roadmap? I mean, we all know the ideal situation would be one single VM with EA, but it’s not really clear if or when that’s going to happen.

Cas

ewjordan · March 5, 2008, 6:01pm

I’m starting to feel like a standard refrain is emerging when it comes to problems of any sort in Java, especially the ones that tend to plague game development:

“There is no problem. Your microbenchmarks are just wrong”
“Okay, there was a problem, but it’s already fixed.”
“No, really, it’s fixed - and your new microbenchmarks suck, too.”
“Alright, it’s not quite fixed, but it’s in the next JVM.”
“On second thought, never mind. It’s not a problem again. Look, we’ve got a great new feature for processors with 64 cores, isn’t that better?”
“Screw you guys, real Java programmers don’t care about this. Go use C# if you want to make games.”

Maybe I’m being a bit cynical and overly harsh (definitely am, sorry!), but looking at the relative performance increases between the Flash player and the JVM over the past few years, I’m starting to cross my fingers and hope that the Flash VM does in fact start supporting Java code as was rumored on another thread, if only because it’s a platform where the desktop consumer is considered worthy of optimizing the VM for. I’m getting ahead of myself, for sure, because the Flash VM is still quite a bit slower than the Java one (and it definitely doesn’t have escape analysis), though the gap is closing (when I get around to running them I’ll post some current test results in another thread).

Sorry to vent, I’m just getting frustrated from the constant choosing between maintainable and fast code, especially when this particular problem (small object overhead) has several different solutions and Sun seems reluctant to implement any of them. Maybe what they say is true, and I keep trying to write Java code to do things that are better done in C++ (math, physics, and finance), but I’m really hoping that’s not the case, because Java is a lot more fun to code and a lot easier to deploy!

Does anyone know much about the internals of EA and why it might be more difficult or expensive to implement on the client VM?

Mr_Light · March 5, 2008, 9:14pm

judging from bugparade your not too far of the mark, then again james gosling said

http://www.parleys.com/display/PARLEYS/The+Closures+Controversy
10. Don’t fix it until it chafes
“Just say no until threaten with bodily harm”

Ok he said later this:
http://blogs.sun.com/jag/entry/closures

CommanderKeith · March 6, 2008, 1:32am

Ken Russel might be able to give us some pointers. By the looks of the slides you posted though, it seems like the problem is with mixing VM enhancements. With method inlining, it’s harder to figure out if a variable escapes or not. At least that’s what I thought the slides said…

princec · March 6, 2008, 12:32pm

I thought it’d be easier… after you’ve performed all your inlining to whatever depth, you’d then perform escape analysis on the resulting code. If any methods have already had EA performed on them before inlining, so much the better because then that code doesn’t need analysing.

Cas

Linuxhippy · March 6, 2008, 7:34pm

Hi again,

I’ll try to answer some questions to my best knowledge … but if I am wrong, I am wrong

The idea of the client-compiler is to do cheap optimizations so that code can be compiled fast - it wouldn’t make much sence to add an expensive (and optimistic) optimization like EA. The server-compiler is the place where it belongs to and with the tired compilers you should be already be able to benefit from it with the JDK7-ea builds.

EA != Stack allocation. EA is a step to gather information which can be used to du stack allocation and several other optimizations.
So EA is implemented and already some simpler optimizations which use the information gathered by EA, but stack allocation is not done till now.

Currently there seems to be some work on using EA for doing scalar replacement, something which should help to remove some of the memory preasure 64-bit systems suffer from (also a win for 32-bit systems)

lg Clemens

princec · March 6, 2008, 10:30pm

They can be sceptical all they like - it’s already implemented in Excelsior JET and it works.

Cas

bienator · March 7, 2008, 5:02pm

They didn’t say it wouldn’t work. But they are sceptical if the “cheap to collect object detection” would provide any gain in overall throughput compared to the compacting technique.

The only advantage I see is the decreased pause time (even if it may run slower or with equal speed) but It may result in longer compile times too (->pauses).

IMO the main reason why there is a attempt to implement EA in the server compiler is concurrency. A lot of tricks are possible if you are able to detect thread private resources (eg allocation directly into registers, removal of locks…).

But don’t get me wrong I am also very curious how it would perform.

princec · March 7, 2008, 5:54pm

Well - I seem to recall the JET guys got EA working and stack allocation in linear time, meaning its impact on actual compilation performance should be minimal. I’m rather hoping that EA and stack allocation make it possible to start using the enhanced for loop and so on without creating tons of unnecessary garbage.

Cas

Linuxhippy · March 8, 2008, 2:55pm

Well I don’t know how serious I should take this post.
Have you ever benchmarked flash9’s VM?
Sure there have been GREAT improvements compared to older versions of flash - but only because older versions of flash did not have a JIT at all.
So in fact Flash9 adds to Flash what Java has since about … well I guess it was 1.1.7 when Sun shipped it with the Symantec Just-in-Time-compiler.

If you would have seriously benchmarked the flash-vm, you would see that it cannot compete not even with the client-vm, not speaking about the server-jvm at all.
The client-jvm does not do too fency optimizations, but has undergone major tuning over the past years - and also benefits from other runtime improvements that were done to the rest of the runtime. And the server-compiler itself generates code better than the .NET JIT without any question.

I don’t know wether the tired compilers will be default anytime soon, but I think this direction is right.
A focus could be to make the client-compiler compile even faster (at the expense of the quality of the generated code), because the server-compiler will be there anyway to optimize the really hard stuff.

lg Clemens

Linuxhippy · March 8, 2008, 2:59pm

Sure, the question just seems who will implement it in Java.
I see ongoing bashing of Sun for not implementing stack-allocation - but on the other side, nobody else takes the work to do it now that Java is open-source. It seems its just not important enough

By the way the team that implemented initial EA and stack-allocation is located in Linz/Austria (40km away from my home) … it seems they already did implement a prototype: http://www.usenix.org/events/vee05/full_papers/p111-kotzmann.pdf

lg Clemens