But how do you know what will be worth compiling?
Just compile the method with the most invocations so far. If all that’s left is invocations == 1, do nothing.
If you’ve got spare CPU and spare memory there’s no reason not to… after all Hotspot could always decompile stuff it thinks isn’t relevant any more couldn’t it? (Another optimisation)
Cas
But there are other threads that are running in the background besides the compiler thread. GC, user threads, interpreter, etc. It would require a lot of effort to figure out that nothing is going on, and then force compiles. One idea that I’ve had (it was my master’s thesis actually) was to compile methods as logical units rather than discrete ones. It required keeping track of methods that were invocated together in the interpreter, and when one of those methods was compiled, the whole logical unit was compiled. Startup suffered, but the advantage was more things got compiled, and GUI apps improved the most (I think it was something like 10% on SwingMark).
Both compilers have background compilation (what you describe above). The Server compiler has 2 threads, while the Client compiler has 1 (and for various reasons can’t have more).
I know that particular tidbit… but those threads are not sensitive to actual computer idle time. If they detected low CPU usage they could conceivably get to work compiling things pre-emptively rather than waiting for some invocation threshold to be reached. Just a thought experiment. Might be interesting to try it and I suspect not too much work to try.
Cas
It’s the same problem, as you have to don’t let your timer to block your game. Compilation could be planed ahead, and some sheduler could coordinate actions between high priority tasks.
Fast on load precompilation, then extensive on background compilation with small time slice woud probably work best, especially if results of optimalizations would be saved. (Just wait for all that new bugs, compiler optimalized out graphic surface… it killed events from system… jerky jerky jerky you like jerky for food? ~_^)
I would love to something like the follwoing, however SUN’s engineers will say its not needed at all (as they said swing isn’t slow and showed us benchmarsk which pointed out how fast JButtons can be instantiated ):
A virtual machine that:
-
Uses the interopreter on code that has not been cached
-
Generates cacheable code for methods that are called more than x/y times and not in cache. This isn’t as hard as all engineers do, even gcj is able to do that. Just remove inlineing and some other tricky optimizations away from this phase and keep code small- Execution would be still about 3-?x faster tha interpreter-only. This would help especially large applications (like swing based desktop apps).
-
Optimizes the real hotspots using the server-engine and replace the cached-compiled-code by the highly optimized new code. In this stage also optimizations could be done that would be hard to cache (espace analysis, inlining of virtual functions, …).
However for Mustang we may not expect major improvements as it looks for now, its just 9 months to the planned release date and I think they want get it stable. As so many improvements 2-phase compilation (which does not solve the slow-startup/slow AFTER startup problem at all) has been directed to Dolphin and how knows wether it will be realized then. From this point of view Mustang has not had really impressive improvements in terms of the runtime
lg Clemens
Problem is, compiling method which has not be run many times so far will not provide all optimalizations (as not enough runtime data is present, which is gathered inside interpreter runs). So only thing you could do in background is ‘generic’ compilation of such methods, with profiling instructions still embedded inside, to recompile them later with full optimalization. Which again hits the need for tiered compilation
Well, that was exactly the idea.
Actually I was interested in something Azeem once claimed in here which was that by setting -XX:CompileThreshold=500 we’d get suboptimal compilation in the server. However it also occurred to me that he might be incorrect Our games tend to do one thing over and over again, and once they’ve done that thing, they pretty much never do anything else. It should only take a few frames of a game loop to determine just about every possible code path in one of my games.
Cas
Compilation with traps, could also gather statistics. Sometimes XX:+alwaysCompileLoops is enough.
2700 seemed to be nicer than default with my library.
I always get the best performance with -Xcomp (+25%)… even in complex code.
Not that I use it, because startup-time is kinda scary