In case you haven’t seen this:
http://misfit.wox.org/jvm-options-list.html
Whow!! :o
very cool! It has some really nice links as well. However I was hoping that this one, “Improving Java Application Performance and Scalability by Reducing Garbage Collection Times and Sizing Memory Using JDK 1.4.1” would be helpful… but then I realized it’s about the length of a good sized book. Has anyone written a tutorial on how to best tune garbage collection for high-fps gaming?
[quote] but then I realized it’s about the length of a good sized book. Has anyone written a tutorial on how to best tune garbage collection for high-fps gaming?
[/quote]
Yes - don’t generate garbage in the first place. Gaming is no different to any other high-performance application, whether that be realtime systems (hard or soft), sci-vis, or a server. All the same principles apply.
How can I manage that when I constantly generate new ships and ammo who’s only purpose is to be destroyed soon after?
If not generating garbage means handling your own reuse lists, you might be wasting cycles and time more than anything, not counting the mess your are adding to your program.
In fact, all depends on how you manipulate objects and allocate them. If you are having lots of rapidly dying objects, generationnal GC does wonders. ( 4% cycles to handle 60MB/s sustained GC is what i call wonders)
If you’re having long living objects, object pooling might be a technique to use.
Mostly, you would have to mix methods where appropriate.
The only long lasting type of thing I have are screen indicators, and I have a re-sizing array to grow as I need more indicators, and I never nullify or call new Indictor() over them, if I have one that I want to be set to a new one, I just change its members.
Is that what object pooling is?
Sort of. But, in a game terms, nothing is ever shortlived - you have to put geometry into some sort of rendering pipeline to get rendered, physics and AI models to compute.
Object pooling basically means returning your objects to a global collection when you no longer need them. Later on, when you need an object of type “foo” you go to your pool and reuse an existing instance, or create a new one if needed.
Pepe, I strongly disagree that any GC is good. We’ve entirely eliminated all our own garbage generation in our applications, but the JOGL implementation has a small set of objects it’s creating every frame (a duplicate view of a NIO FloatBuffer) and that is causing visible stutters in our rendering loop when the GC kicks in. 60MB/s is an astounding amount of junk that should never be allowed to be created in the first place. What sort of horribly implemented algorithms are you using that generate that sort of traffic? With that level of junk generation I’d be lucky to maintain 20 fps on a modern system, let alone something 2-3 years old. Not only that, but the GC works whenever it feels like it and it’s impossible to maintain smooth framerates, which is one of the most problematic issues with GC.
[quote]Pepe, I strongly disagree that any GC is good. We’ve entirely eliminated all our own garbage generation in our applications, but the JOGL implementation has a small set of objects it’s creating every frame (a duplicate view of a NIO FloatBuffer) and that is causing visible stutters in our rendering loop when the GC kicks in. 60MB/s is an astounding amount of junk that should never be allowed to be created in the first place. What sort of horribly implemented algorithms are you using that generate that sort of traffic? With that level of junk generation I’d be lucky to maintain 20 fps on a modern system, let alone something 2-3 years old. Not only that, but the GC works whenever it feels like it and it’s impossible to maintain smooth framerates, which is one of the most problematic issues with GC.
[/quote]
I invite you to test Gosub, then. Go to my home page, read the pdf, get the zip or launch the jnlp and do your own tests. That gamechmark effectively generated that amount of garbage, and does not stutter. It runs faster than 20 fps on modern machines( 55 to 65 ), and over 100 using GL pipeline.
Take care to read the pdf for the reasons of the test and implementation. ( don’t imply i’m that dumb, just because of the junk i generate )
I certainly can’t see a GC stutter on my Asus 1.6Mhz laptop (with typically crappy laptop video card). Although it was as jerky as hell until I turned off throttling.
1.6 mhz? huhuhuhu…
Can you tell me what JRE version you were using?
[edit] oh, and what is your framerate?
java version “1.4.2_05”
Java™ 2 Runtime Environment, Standard Edition (build 1.4.2_05-b04)
Java HotSpot™ Client VM (build 1.4.2_05-b04, mixed mode)
Running on Gentoo linux.
Frame rate is 16fps in non-fullscreen mode. Fullscreen has some weird artifacts (doesn’t take up the full screen (background takes up 3/4 of the screen) and only runs a 5fps.
I discovered that using that graphical realtime memory analysis tool worked absolute wonders for me when trying to figure out garbage collection tuning.
And then I discovered -XX:MaxGCTime which more or less figures it out for you
I advocate an approach between both Pepe’s and Mithrandirs’ extremes. Object pooling actually significantly slows down some of the GC algorithms on the latest JREs due to the way the generational collectors track references back from old generation objects to new generation objects. And the hassle of pooling is nearly always more of a pain in the arse than just letting Java do it. A look under the hood reveals… that the way the JVM actually allocates memory in the first place is that it maintains a whole bunch of little pools of slots of different sizes. Basically doing the work for you.
Mostly what you need to be able to do is tune the young generation collector to ensure that all your garbage gets zapped very frequently and hardly any of it gets to end up in the old generation.
And then you need to stop worrying about GC pauses - because so long as they happen irregularly and infrequently, no-one will notice over general system noise in a normal game.
60MB of garbage is a terribly large amount by the way. You would be definitely wise to tune that down to 600Kb and set the heap parameters appropriately or you’ll be needlessly vastly overspecifying the machine spec to achieve the same results! 60MB of garbage has to fit in 60MB of free RAM, remember!
Cas
True. That’s exactly what i meant when answering mithrandir. More words make it clearer, i should try to do that also.
No that sure. Having a greater young generation means that you will get longer GC and larger memory use. (which is not something you want, true? )
If you have really rapidly dying and reused data, having a small eden is what looks better. Less instances of classes take less to be scanned for reuse.
That’s exactly what happened to me. I had small real GC pauses that i could not see because playing. i had to get a look at the GC log to see them.
The 60MB per second garbage was intentional, and is not meant to be reduced as it was one point of my tests. Nevertheless, it’s not a 60MB chunk that gets collected each second, but over a hundred smaller chunks, which is something very different. All that is explained in the pdf i wrote about that experience.
I tried adding the VM argument: -XX:MaxGCTime and got:
Unrecognized VM option ‘MaxGCTime’
Edit: In this list I found:
-XX:MaxGCPauseMillis=
so I put in
-XX:MaxGCPauseMillis=10
and after a couple pauses during the beginning of play, I never got any pauses later! Very cool. I was staring as hard as I could at the screen and didn’t find any pausing for the 3 minutes that I watched. I couldn’t detect them and that was all I was trying to do. I doubt anyone just playing for fun will notice now
Yes, sorry, didn’t type it right
Cas
[quote]Object pooling actually significantly slows down some of the GC algorithms on the latest JREs due to the way the generational collectors track references back from old generation objects to new generation objects.
[/quote]
That’s the entire idea. You make it so slow that it never has to run at all! Every time you create an object, you use CPU time, every time the GC has to run, you use CPU time. These are, in general, not controllable, and the result is visible stuttering of the display.
An example - we recently took some code that was developed according to Pepe’s recommendations. It was for some networking code. Lots of very short term objects, wrappers etc. It was barely able to run 25 entities at 10FPS. After removing every single object allocation and using object pooling, we can now comfortably handle 5000 entities at 25FPS.
Advising people that creating objects is ok because the GC will take care of it is precisely the reason that Java has such a terrible reputation in the first place. You’re doing far more harm than good for Java (and your potential jobs) with these sort of statements. I’ve seen so much awful code out there precisely because of people making similar assumptions like what you guys are here and having people write off Java as a development language because of it. Where, if they had just stuck to their principles that they learnt for C/C++ programming, the Java code would have run just as well. I’ve seen 2 academic research papers come to just this misconstrued result because of precisely this reason. Both of these were about distributed networking systems. Taking the code that they had, and implementing proper design strategies that elminated garbage generation in the first place had the Java code running faster (more packets/sec handled) than their optimised C++ code. Just because you can do something, doesn’t mean that you should.
Unfortunately, I’ve seen completely the opposite problem - code that is frequently overly cautiously pre-optimized to avoid GC and do pooling and which didn’t need it but ends up MUCH harder to maintain and, when you try ripping it out, actually faster in the long run. It also tends to turn out that the feared GC problems were being caused by something more subtle - or even a completely different bit of code :(.
Not that I’m disagreeing, just pointing out that it’s a problem both ways. I’ve got into the habit of saying “never pool … unless you really know what you’re doing, and your reason is better than “to stave off the GC” (i.e. something more like: “I happen to know what the GC will do about this and I happen to know - incontrovertibly - better”)”.
I’m intrigued by your statements on networking where in particular there are a LOT of big costs in object creation/destruction that have more to do with hidden costs of those particular objects, usually because of subtle ties to OS data structures (e.g. the implicit closing of a socket) or not-so-subtle (e.g. creating direct BB’s - although we tend to see negligible speed penalties there despite Sun’s warnings ???). IMHO, in general, “intelligent” (i.e. architect/design led) pooling in networking code leads to dramatic improvements more because of the intracacies of network stacks than anything to do with GC.
Although I know what you mean with the app-layer penalties, e.g. DOA’s and the awful amount of extra garbage that can transparently spring up from innocent actions on semi-distributed objects or their interactions…But that is a well-known side-effect of transparent distribution (it’s listed right here in my 10-year-old off-the-shelf distributed systems book ;D, under “Disadvantages:”).
/me still hankers after javadoc keywords @destructioncost and @constructorcost as warnings to say “this object has non-trivial creation / destruction costs or secondary-effects that you may need to be aware of, especially if creating or destroying in large quantities”
As a simple example of the problems we came across - a lot of networking uses unsigned shorts or bytes. This class was creating wrapper objects for every unsigned data type. So, for example with a class that represented the message to be sent to the wire, as it was being read or written, they would create a new instance of this unsigned wrapper class (ie sematically equivalent to Integer/Float etc). The class was used for a very simple thing - fetch data from inside object, write bytes to disk. In lifetime terms it was a sum total of 3 lines of code - the create call, the fetch method and a single writer. Basically a classic example of what Pepe is recommending we should all do because the object is very short lifespan ans so the generational GC should kick in and remove it quickly. Unfortunately, in practice it fails. Removing this object creation and just using straight ints with the appropriate bit masking gained us an order of magnitude speed improvement (we went from the initial 25 entities to around 300 @ 10 FPS with no other changes).
The real world example I have is a direct comparison between Alien Flux and Super Elvis’ particle systems.
Alien Flux maintains a huge pool of thousands and thousands of particles ready to leap into action. Super Elvis just news() them and lets them die of their own accord.
Alien Flux’s code, as anyone who’s looked at it will know, is rather hugely complex.
And when, not long after its release, I show everyone the source code to Super Elvis, you’ll see just how vastly simplified the particle system is for not having to worry about pooling.
I used pools for lasers and enemy bullets in Alien Flux too. Now I know that it was basically a waste of effort as the GC is perfectly capable of handling this level of garbage on its own. There are no pools at all in Super Elvis and it hardly stutters at all and I’ve not even tuned the GC yet on it!
Cas