Are people overreacting in the negative performance of GC?

P0jahn · September 25, 2015, 2:45pm

Note we are talking about game dev now.

I frequently read here people recommending to:

stay away from for each loops because they are not cached
pool objects rather than delete them because of the negative impact of garbage collection.

I am creating a 60 fps game(desktop) and uses lots of for each loops every frame as well as creating new/nullifying objects frequently.
Never noticed any slowdowns even if I flood my map with entities.

So my question is, are people overreacting in regards to this? Maybe the JIT-compiler is caching for each iteratables and GC may not have the performance penalty we think?

KevinWorkman · September 25, 2015, 4:55pm

Questions like these can only be answered by specific benchmarks on specific platforms. You say your game works fine on your desktop. If you’re just worried about your game, and you’re just worried about your desktop, then you’re set.

But there are other systems out there. Older, slower desktops. Less powerful laptops. Mobile devices. Games that work fine on your system might run much more slowly on those systems.

And there are other benchmarks out there. Your game might be fine, but another game with different code might be too slow.

So, there isn’t a one-size-fits-all rule, so we can’t really say whether somebody is over-reacting or not. Using things like object pooling might make a difference on some systems. But then again, spending a lot of time on premature optimization might just be overkill if you don’t have a problem.

DavidBVal · September 25, 2015, 5:01pm

My own experience is that GC was no problem at all in desktop, but should be avoided at all costs on Android.

As KevinWorkman says, you shouldn’t worry too much about early optimization that might eventually be overkill, but if you’re going to run on mobile, you might want to err on the caution side from the very beginning.

gouessej · September 25, 2015, 5:52pm

Object pooling is inefficient except in a few particular cases since Java 1.4. Android is another story…

Roquen · September 25, 2015, 6:41pm

The particular cases however can be quite interesting. But that seems beside the point as I don’t see too many people advocating their use, nor worrying very much about GC pauses.

theagentd · September 25, 2015, 6:53pm

What are you smoking? Pooling objects is perfectly valid if you want to avoid garbage even on desktop. The goal isn’t to improve average performance, it’s to reduce stuttering.

gouessej · September 25, 2015, 7:34pm

It was really efficient for this purpose before Java 1.4 but not now in most of the cases. OpenJDK and Oracle Java can handle short-lived objects much more efficiently than in the nineties but it’s not the case of Android Dalvik Virtual Machine.

P.S: [quote]For most applications, explicit nulling, object pooling, and explicit garbage collection will harm the throughput of your application, not improve it
[/quote]

Java theory and practice: Garbage collection and performance, Brian Goetz.

Roquen · September 25, 2015, 8:32pm

Quite a lot (if not the majority) of high performance java software today uses object pooling…and off-heap allocations. The fact that hotspot can sometime detect non-escaping objects and occasionally perform scalar replacement coupled with GCs sucking sightly less means that getting a performance win out pooling actually requires a bit of thinking.

Well in some cases it is to get an average performance win. Linear memory layout for user-code…less roots to scan for the GC. A lot of people completely ignore the work stealing side of the GC and compiler/decompiler. Stop-the-world is just part of the picture.

gouessej · September 26, 2015, 8:08am

I agree about you concerning the off-heap allocations but not about object pooling. The few maintained forks of JBullet I found on Github stopped using StackAlloc, the object pool became completely optional in Ardor3D several years ago. Do you have any example of high performance open source Java software using object pooling extensively?

nsigma · September 26, 2015, 8:42am

Not a direct example, but I’ve had some interesting conversations at conferences with people doing low-latency Java work in the financial sector, with whom I bizarrely seem to have most in common. That’s definitely one of the tricks in use. Just found this link from 2013 which is a quite interesting read http://www.infoq.com/articles/low-latency-vp, Q5 in particular. Obviously noting the “The worst thing you can do is prematurely optimize this area of your applications without knowing whether it is actually a problem or not”!

Because I do low-latency audio coding, which has tighter latency constraints than graphics (60fps vs 600fps), not to mention that audio ‘frame’ drops are a hell of a lot more noticeable, Praxis LIVE actually supports the ability to run the audio in a separate VM to the UI. This is an interesting approach which is probably overkill for most usages, but has a demonstrable performance improvement for my use simply from reducing GC in performance critical code.

Roquen · September 26, 2015, 9:19am

page 24: http://www.azulsystems.com/sites/www.azulsystems.com/Azul_Systems_Trading_Chicago_2013_presentation.pdf

I can link some specific examples if you wish.

nsigma · September 26, 2015, 9:23am

page 31 is funnier! ;D

TheBoneJarmer · September 26, 2015, 9:43am

I had some major issues with the GC at runtime with one of my own projects. Managed to fix those with preventing a lot of instantiating in my main loop. I bought my laptop somewhere in February this year and got some nice hardware. I can play Tomb Raider Underworld at full settings without lag, so I was surprised to see how quick my own project, which had 1000s cubes and a gigantic floor, was lagging even when Lara Croft’s tits had more vertices then all the models combined in my game. When I run the JVM with the option -verbose:gc it became clear what the problem was.

Like someone said somewhere above me, you may be able to run that game without problems while others can have complete difficulties with it. Since laptops and desktop computers differ so much in hardware I actually do recommend taking care of it as soon as possible, not just only for projects for Android. I had these same problems with Varkas’s RPG Jewel Hunt. He made some changes and things work fine now. But if he didn’t, who knows how many players would have had the same problem if he would have published it as a full version? That would cause bad reviews and less popularity. So yes, it is something crucial in my opinion. Not something to take lightly. Java isn’t c++.

That being said, I do not see how people can overreact about it since it is a part of Java, and in my humble opinion, simply important. If I, as a player, experience lag while playing your game because of the GC and you do nothing about it, I simply quit, give a bad review and move on. That is the way how most of your players will react.

noctarius · September 26, 2015, 9:49am

It can’t be just answered without further information. The HotSpot (Just-In-Time Compiler) of the JVM becomes better with every single version. Often the JVM can figure out the for-each loop (and creating the underlying Iterator) is unnecessary and therefore the emitted native code uses a index-based loop. HotSpot does this for arrays and ArrayLists. It also does a lot more of optimizations at runtime to prevent generating garbage at all (like escape analysis). There are worst things to do than using for-each loops, for example looping over huge lists with just a few objects you’re interested in. For those cases it makes sense to split elements into smaller lists.
Object pooling only makes sense if you know that your object creates a huge amount of garbage on creation or that creating / initializing the object takes a long time (database connections for example).

KaiHH · September 26, 2015, 10:12am

I agree.

C/C++ programmers know that there is no such thing as HotSpot for them that might eliminate memory allocations, and they know they have no garbage collector that can clean after them. So, C/C++ programmers are simply and generally more careful about memory allocations and on how they write programs, so they would not scatter malloc’s and new’s all around their code base. That requires real thought and a lot of discipline.
This is not to say that programming in C/C++ requires more discipline than programming in Java. C++ programmers just “have” to think about memory allocations, because noone will take care of it for them.
On the same side it’s just easier for Java programmers to “shoot themselves in the foot” by relying on the magic that happens or might not happen by HotSpot and the GC. And I would say, this is the inherent danger when using a garbage-collected language. You just have the tendency to lean back in comfort by telling yourself “GC will handle this.”
It might one time, and it might not other times. And it becomes more difficult with ever new JVM versions to really find the cases in which HotSpot does you good and in which other cases it won’t do you any good.

My point: Writing well-behaving and efficient programs requires intense thought and a lot of discipline which no tool/language can handle for you in all cases, …however, escape analysis and intelligent garbage collection is a good way to relieve the programmer of thinking about allocations in some very easy to detect cases.

Roquen · September 26, 2015, 12:13pm

If you have a target “rate” of “F” in 1/sec and some cost ‘x’ in milliseconds, then the percentage of budget of ‘x’ is

pcost(x) = Fx/10

if F=60, then this is pcost(x)=6x.

pcost(5) = 30%

So a 5 msec stop-the-world is stopping all of your cores for 30% of a frame’s cycle budget.

This is over stressed. With a bit of practice you don’t really think about it at all. Stack allocate small non-escaping, zone-allocate larger non-escaping and the vast majority you allocate before you start doing anything and don’t think about it any more. Oh you’ll have some pool allocators for dynamic data-structures, but stuff like that is hidden away in the implementation.

My view on this is: all languages and compilers are stupid. Sure they are some excellent transforms that can happen…some are even pretty amazing if you think about them. But they’re still stupid…they have to be. The must obey what you’ve told them to do and can only transform legal code into legal code. They cannot break the rules of the language and how you’ve expressed some code. That restricts them to (on the upper end) to about 10% of the problem.

On escape analysis: I’d really like to know what the current set of limitation are. They were rather strict the last time I looked. Likewise for scalar replacement…which boiled down to it could only happen if the call to “new” is a a function is is a leaf from the allocations perspective: i.e. it could not be passed to any method…any method’s which it was logically passed to must have been inlined.

Spasi · October 5, 2015, 2:35pm

[quote=“Roquen,post:16,topic:55568”]
I encountered this problem in LWJGL recently and it’s still exactly like what you describe. The usual failure case looks like:

User code allocates an object and passes it to a method
which calls a simple method
which calls a simple method
…
which calls a complex method that uses the allocated object

Assuming that all these methods are pure and do not leak the reference anywhere else, the allocation will still occur if the final method is too big to be inlined. If the final method is in library code, one thing you could do without affecting user-code is:

// before
public static void complexMethod(..., MyClass object) {
    // lots of bytecode...
}

// after
public static void complexMethod(..., MyClass object) {
    complexMethodImpl(..., object.fieldA, object.fieldB, ...);
}

private static void complexMethodImpl(..., MyClassFieldA fieldA, MyClassFieldB fieldB, ...) {
    // lots of bytecode...
}

This lets complexMethod get inlined and escape analysis will eliminate the MyClass object allocation.

As an example, the case I’m currently optimizing in LWJGL is [icode]GL20.glShaderSource(int shader, CharSequence… strings)[/icode] and “complexMethod” is UTF-8 encoding the input strings. With the above trick, allocation of the intermediate ByteBuffers is eliminated.

(not that you’d ever call glShaderSource so many times that it’d be a problem, but the same solution can be applied to similar scenarios)