Garbage collector tuning

princec · July 31, 2012, 3:42pm

Though interestingly watching the GC and heaps in jvisualvm on various games and that which I’ve got here I don’t ever see any evidence of escape analysis actually doing anything. Heap still fills at same rate.

Cas

Roquen · July 31, 2012, 3:53pm

I’ve never been motivated enough to test it…too much work. But even if worked ideally (note this is why contracts rule: @NoReference would be awesome)…it still only useful in a sub-set of cases (notable tuples).

keldon85 · July 31, 2012, 3:59pm

Escape analysis might have eliminated the need for GC of your actual object (though not the strings) in the sans-pool approach (messing up the results completely). To really compare them you would want to create a benchmark that better reflects the object link structures you will have in a game situation.

The basic essence of how GC works is by testing reachability from root nodes. Root nodes in this case are object references declared in currently executing functions and static class variables. The state of all of the object pointers needs to be frozen when this process is carried out, hence the use of stop the world GC’s. Now that is costly, because increasing the size and complexity of object structures increases the cost of carrying out a GC mark and sweep.

Now pooling wouldn’t affect GC that much in that case, but it would eliminate the memory management cost. Of course as you’ve seen implementations and technologies are a lot different:

there are incremental garbage collectors that don’t need to stop the world and can operate in a separate thread, increasing the time it spends GC’ing based the behaviour of the program
there are generational garbage collectors that in a sense partition the structure, traversing some partitions more often than others - reducing the cost of GC’ing significantly since it traverses much less nodes and the nodes-traversed/nodes-deleted ratio is much lower (and therefore more efficient)
we now have fancy algorithms that JIT compilers use to pretty much rewrite your code, sometimes eliminating your new call completely (exhibiting pool like behaviour, depending on how you look at it)

Either way we’ve not yet reached a point where we don’t need to manage memory or objects just yet, especially for real time applications where predictability* is crucial. So it’s not to plain cut and dry as to whether pooling will or will not give you a boost. As your structures become more complex you will be able to see things in your system that G1 will not be privy to. And most important of all, trust the real world results more than theory.

In fact I once had a funny story of using floats on an embedded system. According to all available documentation on a certain 100% integer based processor a co-worker of mine insisted on using floating points rather than integers. He was not aware of or bothered with the technical workings of the device so ignored my warnings, so to show him the penalty and risks associated with using floats I made some benchmarks performing various arithmetic on random numbers. To my surprise the benchmarks showed the floating point operations to be faster than the integer operations in most (if not all) cases. I was baffled. I shared this information with other developers and they could not understand it, as it should just be impossible. There was no fault with the code and others could repeat the behaviour. Well, the moral here is that sometimes all theories and knowledge are trumped, so don’t be too presumptuous. I might add (for reference), that we were using a fast floating point library with associated cache risks.

So because of the documentation, I’d hazard a guess that none of the developers on that system ever used floating points in their physics code, or even tested it. Dogma is a dangerous thing, so I’d say it would be good to listen to the real world results, or better yet, keep working on improving on your benchmark and perhaps you will be able to share with us some interesting referencial results.

*: I understand G1 has high predictability for suspending it’s GC thread, still doesn’t change the fact that there are associated GC costs and allocation costs that can be reduced

sproingie · July 31, 2012, 5:26pm

Aside from stack allocation, which is a different thing entirely, I’ve only ever seen this behavior with String constants and small values of Integer or other boxed types. I really don’t believe this behavior is even possible with most objects: Java isn’t referentially transparent, and even if there were a Sufficiently Smart Compiler™ that could infer it by way of noticing an aliased object is never mutated, such a compiler couldn’t support separate compilation, which is a cornerstone of Java.

krausest · July 31, 2012, 5:58pm

I ripped a smaller test case out of a 3d demo to see if scalar replacement works. In my case it did an amazing job. Here’s the link to my blog entry: http://www.stefankrause.net/wp/?p=64

keldon85 · July 31, 2012, 6:28pm

I was alluding to Escape Analysis there. Google’s V8 is an excellent example of what can be achieved with this technology. In fact they’ve taken it a step further and even get a fair amount of code compiled directly into integer operations (despite Javascript’s weak typing).

Main point though, is just to draw attention to what’s going on and highlighting the dangers of dogmatically ignoring pooling and to create a benchmark much more accurate to the case.

princec · July 31, 2012, 7:01pm

Excelsior JET does a very efficient job of escape analysis, replacing heap allocations with stack allocations. The problem is that JDK7 doesn’t seem to be doing this yet, or if it is, it doesn’t seem to have any noticeable effect on garbage that I’ve seen in some game code. Stefan’s blog entry is pretty interesting though - it’s clearly doing it there. I wonder if you increase the inlining size of the compiler whether it’d be more effective. I might try that on Project Zomboid later.

Cas

Nate · July 31, 2012, 11:11pm

Well, in the article’s defense, it did say “object pooling is a performance loss for all but the most heavyweight objects on modern JVMs”.

kaffiene · August 1, 2012, 1:17am

I have an OpenGL based game which creates tonnes of particles and bullets without any pooling and the worst GC pause I ever saw was about 1/100th second, with most pauses in the 1/1000th - 1/10000th second range. That’s using -xincgc.

Roquen · August 1, 2012, 11:43am

It’s important to note that all reported “findings” should be taken with a large dash of salt. And even when the findings are not suspect they are only useful in the context of which they were written. Most java writings will be in terms of application & server programming which has drastically different needs from computationally expensive (such as games or scientific) programming and soft-real time (such as games), etc.

Also we seem to be mixing up terms (unless hotspot is using non-standard terminology)…in terms of java:

escape analysis: determine if a reference cannot escape its creating call frame. If so then it can be stack allocated and some other optimization can be performed.
scalar replacement: determine if the object itself may be broken apart into scalar components. At the extreme the object itself can be removed and its fields live on the stack and/or in registers.

@kaffiene: I’m not sure what you’re saying here: 1/100 of a second is terrible.

princec · August 1, 2012, 11:54am

Indeed, 10ms is awful, we’d be looking at juddering frames with that sort of pause. The maximum we want to spend on GC in a frame is maybe 3ms on a 1.6GHz-class single core sort of system. This is one area that the G1 GC excels at: when you give it a target collection time parameter, it’s really pretty good at achieving that, which means with a bit pooling and careful coding not to generate too much crap in a frame, we’re guaranteed a more or less rock steady 60Hz even on low-end systems. Awesome.

Cas

Riven · August 1, 2012, 11:54am

The sweetspot of trouble is having (hundreds of) millions objects that mostly need to be retained, while also generating many objects of which a small percentage makes it out of eden. The ‘full gc’ that eventually occurs will have to move around most data in the different heaps and rewrite all pointers of the moved data. This is relatively slow.

(offtopic)

[quote=“Roquen,post:50,topic:39412”]
Classic misinterpretation of the expression… ‘a grain of salt’ is insignificant, a ‘large dash of salt’ is more significant, not less.

princec · August 1, 2012, 11:57am

Haha pedant One more observation about games and GC: irregular, sometime even long, GC pauses are acceptable, even in realtime arcade games. What is not acceptable is regular GC pauses, as they seriously ruin the experience as they break the brain’s ingenious ability to correctly predict motion.

Also of course games that don’t rely on constant realtime action… who cares about GC

Cas

Roquen · August 1, 2012, 12:05pm

@Riven - with a “large dash of salt” you’re gonna think about it more before swallowing it…at least you should.

gimbal · August 1, 2012, 1:01pm

If you need to be a pedantic ahole to be right, then wear the badge with pride!

EDIT: eeeeh, not that I’m calling anyone a ahole. Just speaking in general.

Riven · August 1, 2012, 1:04pm

Hence it’s more significant. :clue:

Roquen · August 1, 2012, 1:53pm

Why can’t use just have structures.

nsigma · August 1, 2012, 3:11pm

Out of interest, has anyone found -XX:MaxGCPauseMillis to be of any use whatsoever?

princec · August 1, 2012, 3:22pm

Only really with the G1 GC.

Cas

nsigma · August 1, 2012, 3:31pm

To be read as it’s worth doing? I was under the impression it was meant to work elsewhere (with CMS I think), though don’t recall it having much effect the last time I tried.