GC Implementation Specifics

swpalmer · January 17, 2004, 3:19am

I’m trying to understand the generational collector and I don’t seem to have it quite right.

I think in general it works like this:
When the Eden is filled GC runs on it… any objects in Eden that are still alive with a generation count that exceeds some threshold are promoted to the next generation and the Eden space is compacted - those objects that remain in Edne have their generation count incremented. The new object is created in the available space that is reclaimed. If the new object still doesn’t fit in Eden it goes straight to the older generation which usually has a larger max size.

Is that correct? If so…

What happens when you allocate several large objects in a row… such that they are too large to fit in Eden after it is collected but too small for the VM to know to place them in the larger heap right away. Does the 2nd large object and those after it cause GCs in Eden that basically do redundant collections that don’t reclaim any/enough new Eden space yet still bump the generation count on the young objects so that they are prematurely promoted?

Or after a Eden GC does the VM track the amount of KNOWN free space in Eden… hmm no that isn’t possible, everything left in Eden could be garbage by the time the next allocation happens, so there just may be enough space, it will have to try to collect.

I’m trying to learn how to best tune both the GC parameters and object creation patterns.

E.g. I have an app that generates lots of short lived, large objects (images from motion-JPEG video frames). I had some issues with GC pauses. these mostly went away when I tried -XX:+UseConcMarkSweepGC but the video is still not 100% smooth. I wonder how I might tune the generation sizes to optimize this.
I don’t use JMF because last time I tried it didn’t work well enough and is fairly complex.

gregorypierce · January 17, 2004, 2:16pm

You’re thinking along the lines of where I’ve been headed. I keep wondering to myself - if I don’t have large dynamic memory needs… why not just create one big ass eden or eden + generation one where I know everything is going to fit, thus meaning that the garbage collector would never need to be invoked?

Jeff · January 17, 2004, 7:20pm

Okauy,

First off thre is no edne “compaction”.

It works mroe like this.

Obejcts are allcoated till the eden is filled. When it is filled, some very trickey datastructures that track use of eden obejcts are scanned. Any obejcts that are still in use at that point in time are moved to the next generation. The eden is then wiped clean.

The next generation s one of two things depending on whetehr or not you have incremental collection turned on.

If the incr collector is on then the next generation is what are called the “trains”. In simpel terms they are a set of linked lists that “bubble up” object through them in progression during aprtial collects.

Finally if they get to the end of the trains without going unreferenced they move to the final generation.

Each generation’s code/algortihym is 'tuned" to be better for the kidn fo obejct its likely to encounter. The eden is easy to collect but costs you a bit mreo in space and processor for live objects. In contrast, the final generation is the cehapest cost for keeping thinsg alive but the msot expensive to collect.

There really IS a reason for mant generations.

gregorypierce · January 17, 2004, 11:36pm

Right, but the question is - what happens if eden is never full. What if I increase the size of eden to the point where it never fills up.

a) can I do that

b) what are the implications of doing that

Jeff · January 17, 2004, 11:53pm

[quote]Right, but the question is - what happens if eden is never full. What if I increase the size of eden to the point where it never fills up.

a) can I do that
[/quote]
Well in order to do that you would have to have an app that had static allocation needs. (Did not do allocation in its runnign but only for set up.)

Which means that all your obejcts are really long lived.

Which means you DONT want them in the eden. You are paying CPU and memory for tracking those. In fact yo uwant them to hit the old generation as fast as possible, which means you want a SMALL eden, not a large one.

[quote
b) what are the implications of doing that
[/quote]
See above.

swpalmer · January 18, 2004, 1:08am

[quote]Obejcts are allcoated till the eden is filled. When it is filled, some very trickey datastructures that track use of eden obejcts are scanned. Any obejcts that are still in use at that point in time are moved to the next generation. The eden is then wiped clean.
[/quote]
I see an obvious flaw in this, well not a flaw really - but in some cases an undesirable characteristic. If eden is wiped clean when it is filled it means that the most recently allocated objects are promoted to the next generation (i.e. non-eden) prematurely. By that I mean after a VERY short life relative to the objects that were put in eden shortly after it was last “wiped clean”.

My concern with this is that in my case I have relatively big objects (video frames) and that for animation smoothness I will have one or two of these buffered that are about to be displayed. Soon as they are painted over (approx 1/30 sec apart) they are garbage. So every time one image is allocated that doesn’t fit in Eden one or two images would be promoted to a generation in which they don’t really belong. Thus generating short lived objects in the wrong generation.

Am I making sense? The GC algorithms could be much more complicated and handle this case. Can you tell me if they do?

princec · January 18, 2004, 10:30am

The lifetime of objects is not actually measured in “time” but in GC generations. It doesn’t matter how long the object has been in Eden for in physical time; it’ll still be cleared out if it’s not referenced when the collector comes knocking. The trick is to tune the size of Eden so that during the course of your tight inner loops, the Eden space is just over completely filled by the time the video frame ends, which will cause at least one collection per frame producing almost no garbage whatsoever and costing very little to perform. The very little garbage that is still referenced when the GC occurs will end up going into the incremental collector where it is purged on a less frequent basis, but purged nonetheless, also at very little cost.

Therefore, for video games, I think you need a very small Eden indeed, and even then, you should avoid constructing objects if you can help it during those inner loops.

Cas

gregorypierce · January 18, 2004, 7:36pm

Anyone know of a profiler that will be able to tell me how long my objects spend in each generation and when they move from one generation to the next?

While I think its cheesy to have to wait for my objects to reach tenure since I know I want them to end up there (any time that it takes for them to reach the old generation is just a waste), I guess if that’s the only way to do it I’l lhave to deal with it.

Jeff · January 18, 2004, 7:47pm

Hmm. Iits possivble actually that the built in profiler might give you soem of that info if you do -verbose:gc

I dont kwno if the info is avauilable to external profilers or not, the way to find out is to look up the JVMPI docs on java.sun.com

Jeff · January 18, 2004, 7:49pm

On tuning the edben. The VM will try to tune it for you based on whats going on in run-time but in a case like this (one set of allocs per level) it might take it a long time to get enough info.

There is a -XX flag to set the initial eden size. I poinetd it out to Chris when he did his benchmarks and it may be in his article. If not, I can dig it up again if you’ld like.

gregorypierce · January 18, 2004, 7:56pm

I would appreciate it. This is the sort of thing I would love to see in a whitepaper:

“Tuning the JVM for games and other multimedia applications”

I think this is a very common and highly efficient practice for most games. At least it is for me coming from a console background where you just didn’t have any choice.

“You have a certain amount of memory and that’s it - suck it up and make it fit.” (me to a junior cohort back in the day)

Jeff · January 18, 2004, 8:31pm

I’ll look for it.

The problem is we CAN’T publish the -XX flags in a white paper because they are -XX. The whole point of -XX is that they are VM specific and subject to change from VM version to VM version.

swpalmer · January 18, 2004, 10:21pm

[quote]The trick is to tune the size of Eden so that during the course of your tight inner loops, the Eden space is just over completely filled by the time the video frame ends, which will cause at least one collection per frame producing almost no garbage whatsoever and costing very little to perform. The very little garbage that is still referenced when the GC occurs will end up going into the incremental collector where it is purged on a less frequent basis, but purged nonetheless, also at very little cost.
[/quote]
Well this doesn’t address my concern. I will always have a reference to a relatively large object (a bitmap image or 2 representing a couple video frames), so such objects will always be live when the eden is filled. I know they are “short-lived” in a real time sense, but it seems that there is nothing I can do about the fact that they will get promoted to the next generation.

The thing I get out of this GC explanation is that it impossible to not have “short-lived” objects get moved from Eden. If you have only a single live object when you attempt to allocate another and that allocation won’t fit in eden, then the one live object that is in eden, regardless of chronological age, is moved out of eden.

If that object is large then there is a significant cost to copy it out of eden, even if the next generation will efficiently collect it because that generation never really gets a lot of work to do. That means a slight blip in the GC usage when eden fills, even though my algorithm is designed to produce only very short lived objects - specifically to take advantage of the efficient collection that the young generation performs for such objects.

Is it best to simply reduce the size of eden in this case so that my large object is immediately placed in the next generation so that it never needs to be copied out of eden?
I then have to worry about how efficient the next generation is with the many short lived objects I will be generating out side of the very space that treats them most efficiently.

Comments? Suggestions?

Gregory, and others… there is a really nice program that I mentioned in this forum a few months ago. It shows the sizes of the various generations, the CPU time taken by GC… it’s very nice. I can’t remember the name at the moment. I’ll find the thread and give it a bump.

Jeff · January 18, 2004, 10:46pm

Hmm. So at this point I have to ask…

Do you KNOW there is a problem or is this all just supposition? The VM inernally tries to tune all the memory spaces for over-all efficiency.

This seems to me to be pretty deep into abstract logic which may or may not actually be causing any significant issues…

gregorypierce · January 18, 2004, 11:09pm

Yes it is definitely a problem - one that is more and more pronounced the slower the machine is. Basically what’s happening when I do a verbose gc is that I have a lot of little gcs and then some long stop the world painful gc - but I’m not allocating jack. If I’m not allocating anything, the gc needs to leave my stuff alone.

swpalmer · January 19, 2004, 12:31am

In my case I know that GC is effecting my programs animation because if all I change is to add -XX:+UseConcMarkSweepGC the pauses are significantly reduced.

During this phase of my program the most significant object allocation is images created from JPEG data. I could reduce this with ImageIO reading the JPEG into an existing image buffer. However then there is a memory leak caused by ImageIO. The fix for that leak is to call some sort of re-init method on some ImageIO objects… but the side effect of that is a MASSIVE slowdown (it triggers a call to System.gc() in the ImageIO code, which in turn is a result of the evil use of finalizers) so I’ve settled for the older garbage creating method.java
The ImageIO bug in question is 4868479 - apparently it is fixed in Tiger… but that won’t help me for a while yet… the Mac platform is where this matters most to me now.

Jeff · January 20, 2004, 10:35pm

yes thats very odd. if you aren’t allocating then you really shouldnt see any GC activity unless youa re running incrmental gc.

if you are, try shutting that off.

ChrisRijk · January 21, 2004, 2:54pm

http://java.sun.com/docs/hotspot/VMOptions.html

Has some detail on a lot of non-standard flags.

The -XX:CompileThreshold, -XX:MaxInlineSize and -XX:FreqInlineSize are interesting since they’re about the only flags that affect code optimisation, apart from -client and -server. Almost everything else is GC related (compare and contrast to C compilers!) However, in my experiance they almost never make much of a difference when -server is on, though one of my benchmarks in one particular setting did get a 30% boost if I remember correctly… This isn’t a criticism of HS - I think it’s really cool that I don’t have to bother with this stuff. Let’s hear it for automatic optimisation!

Also see the docs on GC tuning with various VMs:
http://java.sun.com/docs/hotspot/index.html

NVaidya · January 22, 2004, 3:49pm

ChrisRijk wrote:
The -XX:CompileThreshold, -XX:MaxInlineSize and
-XX:FreqInlineSize are interesting since they’re about the only flags
that affect code optimisation, apart from -client and -server.

OK ! The official description for these flags appears to be rather
cryptic for my intelligence ???. Anyone know more to elaborate…

TIA

Jeff · January 23, 2004, 7:15am

Compile Threshold is the number of times the interpreter has to pass over the same code befoer it decides its worth compiling.

One of the key differences between clietn and sevrer VM is that the CompileThreshold default is much lower for client thus reducing apparent start-up slowness of GUIs.

MaxInlineSize refers to the size of code produced by linlining. Im not 100% sure if this is the maximum size of a rotuine to be inliend or the maximum size code is allowed to grwo to by inlining. Chris may know.

This is important because inlining to the point of over-flowing your instruction cache on a modern CPu is a de-optimization. Again, Hotspot does its best to set this to a good number based on what it can know/guess about your system by probing it but ist always possible that you know more about a particualr unusual system then it does.

FreqInlineSize I don’t know, I’ll try to peek at the docs and see if they mean anything to me.