Garbage collector tuning

Currently reading a nice article on Java performance :

I thought I would share because it talk about a subject I heard a lot around here : Object pooling. I guess it’s nice to understand why it is a bad idea to use Object pooling :slight_smile:

[quote]Object pooling
Object pooling is a straightforward concept – maintain a pool of frequently used objects and grab one from the pool instead of creating a new one whenever needed. The theory is that pooling spreads out the allocation costs over many more uses. When the object creation cost is high, such as with database connections or threads, or the pooled object represents a limited and costly resource, such as with database connections, this makes sense. However, the number of situations where these conditions apply is fairly small.
In addition, object pooling has some serious downsides. Because the object pool is generally shared across all threads, allocation from the object pool can be a synchronization bottleneck. Pooling also forces you to manage deallocation explicitly, which reintroduces the risks of dangling pointers. Also, the pool size must be properly tuned to get the desired performance result. If it is too small, it will not prevent allocation; and if it is too large, resources that could get reclaimed will instead sit idle in the pool. By tying up memory that could be reclaimed, the use of object pools places additional pressure on the garbage collector. Writing an effective pool implementation is not simple.
In his “Performance Myths Exposed” talk at JavaOne 2003, Dr. Cliff Click offered concrete benchmarking data showing that object pooling is a performance loss for all but the most heavyweight objects on modern JVMs. Add in the serialization of allocation and the dangling-pointer risks, and it’s clear that pooling should be avoided in all but the most extreme cases.
[/quote]
Here are 2 other links that give the necessary information to understand this article :slight_smile:


Note pool can be interesting for dataflow optimizations.

That is an article from 2003. Things have changed, and not only for the better.

If you want semi-realtime behavior (a stable framerate), then object pooling can still be a good idea, as the GC won’t have to clean up those tens of thousands of objects created every frame.

Let alone Android, were the GC is poor and object allocation is slow.

I tended to be really careful not to generate any garbage in the main loop of a game, but I started being more relaxed about it after reading on JGO that garbage collection is not such a big deal anymore. And usually, in PC’s, it isn’t.

Some ideas on this:

  • Games are a special case, in that usually the “smoothness” and real-time character of a game is a critical aspect, much more so than with most other application types. Stutter caused by garbage collection ruins a game much faster than, for example, a word processor.
  • Synchronization issues are irrelevant if you do not create a multithreaded game (which most indie developers don’t I guess).
  • “Tuning the pool size” is not that hard if you have an idea on how many game elements you want to support. And, ofcourse, you can increase the pool size by a step whenever needed.

The main reason for me for not using object pools is mainly that I usually don’t run into performance issues with garbage collection (hence, unnecessary optimization), and it makes the code a bit more complicated. But in rare cases it may be useful, e.g. when the garbage collector is very inefficient (e.g. Android? dunno) or when massive numbers of objects are created/disposed each update (e.g. a gazillion bullets on-screen).

Pooling is great, especially for games, especially for single threaded games, even more especially if you’ve got the G1 collector. The things I pool are sprites and particles which are used and reused at a pretty frightening rate. I also pool DirectByteBuffers which I use for loading images (so basically I only end up actually allocating one rather large one and reusing it over and over, as it should be), and certain hacks in my sprite engine to render arbitrary geometry are pooled - particularly things like dynamically sized arrays which are created every frame - expensive.

“Always be pooling”

The degenerate form of pooling is of course a static final scratch object. I use those a lot too…

Cas :slight_smile:

Just to be pedantic, if you’ve got audio in your game, then it’s multi-threaded! :wink: Not that you’re likely to be sharing pooled objects between audio and video renderers.

Why especially with G1? Out of interest, has anyone found it to be any better? I’m still using -Xincgc and that still works better for me. It handles audio and GC well without much pooling - I mention audio because it’s not usual you’re aiming for ~700fps in video and missing an audio frame is far more noticeable, not because the application is audio only.

OpenAL handles async streaming just fine :persecutioncomplex:

With a bit of highly unscientific testing I found the G1 causing regular hickups when I run my game on only one CPU core. Which does not happen with the default GC.

Almost everyone has a dualcore system now (almost - here’s your up-to-date typical gamer hardware survey - this is truly the gospel with regards to games and what’s out there and who’s using what). Its chief advantage, allegedly, is that it is concerned with throwing away easy garbage before it goes looking at the rest of the heap, or so I am led to believe. This means that things like object pools, which basically make your heap full of old, live objects (the worst sort of object for ordinary GC performance purposes), are much cheaper to the G1 GC, as they aren’t generally looked at.

Experiments with the G1 GC with Project Zomboid are interesting. The heap barely grows - in eden - and is collected very frequently and pauselessly (they use a lot of threads too mind). The incremental GC suffers a noticeably annoying pause every minute as eden fills up considerably more. Unfortunately, for reasons we are still trying to determine, the G1 GC causes a permanent graphical corruption after its first collection (incgc also causes it but it seems less obtrusive, oddly).

Cas :slight_smile:

btw OpenAL is an interesting case, as behind the scenes OpenAL has its own mixing thread running in high priority. Then in Java you generally need a separate thread from your game thread to stream data to OpenAL. But seeing as you’re just using the same set of buffers over and over and not creating all that many objects on a frame-by-frame basis you’re never going to drop a frame of audio. Well, you might once in a while due to circumstances beyond your control but we’re not talking realtime OSes here.

Cas :slight_smile:

Which is not the same as saying the whole thing is single threaded!

hmm, that’s assuming that objects created in your graphics thread and GC’d don’t affect the audio thread - GC can be ‘stop the world’!

Indeed they absolutely will. However the cunning part is that you queue up several milliseconds of data in advance to OpenAL, and as OpenAL threads are unaffected by a Java GC pause, they carry on mixing and playing your sounds. Only if you’re particularly unlucky do you get sound stuttering issues when Java has been unable to queue up enough sound to OpenAL and it runs out.

FWIW I queue up about 2 seconds of audio on each music stream playing (usually 1 or 2), probably overkill as a GC pause is rarely ever more than 20ms.

Cas :slight_smile:

@Cas - with 2s of audio you shouldn’t ever hear issues - because I’m doing the DSP in Java I’m usually working with about 6ms and GC isn’t an issue!

Now you mention it, it is a bit daft :slight_smile: I’ve cut it down to 4 buffers of 4kb each (about 0.1sec).

Cas :slight_smile:

That is wrong… There are absolutely no cost for the GC to clear tens of thousands of objects created every frame. For all those that didn’t bother to read the 3 articles here is the idea:

By default, in JDK 1.4 (ok it’s old), the GC is a generational garbage collector. What that means? Simply that the GC makes a differences between yound object and old object. Young object are object that doesn’t last a long time. There are a lot of those object in your programs; in fact the average is 98% of young object. For example of young object, consider variable that you create only for the scope of a method. Old object are usually static class field or instance variable in another object.

Now the GC use 2 differents algorithm for young and for old objects. For old objects, it use a mark-sweep-compact GC. For the the young objects, it use a copying collector. The way he acheives that is by splitting the memory the JVM use in two. In the first half, the GC stores the old objects and in the second half it stores the young object. Further more, the second half is again split in two; one part where are the young object and the other part is empty and will be used to copy the live young objects when the first part is full.

For the young objects, the GC use the copying collector. With the copying collector you split the memory in two and when one part is full you copy everything that is still active in the other part. There is indeed a cost to copying everything to the other side. The good point is that only object that are still active need to be copied. In fact, object that are not active anymore will not even be visited as the copying collector only visit active object.

So there are absolutely no cost in clearing the thousands of object created every frame.

In fact, you can think of the copying collector for young object as automatic object pooling. You don’t even have to think about it and if you try to do it by hand you will fight against the GC.

No matter how you slice it…GC must walk memory (not free) and usually random memory (even less free) and when compacting must move memory (not free).

@Gudradain - I’m afraid you’re the wrong one in this case. The GC does actually cost fairly significant time to clean tens of thousands of objects - often longer than an entire frame - which is no good if you need a rock steady 60Hz update. So far the G1 GC is actually better than most when it comes to sweeping eden out but is otherwise somewhat slower. Long story short, pooling creates no work at all for the GC, and so you therefore have no problems with unexpected pauses.

Cas :slight_smile:

@Gudradain you explained the workings of the GC quite nicely, but sadly drew the wrong conclusions. There is a (relatively) high cost in GC, which gets exponentially worse when dealing with high concurrency. But let’s not go there yet, even in single-threaded applications both object creation and cleanup is far, far from free.

and rewriting all references to the moved objects (even more extremely less free)

I’m afraid you have misconception and are using wrong optimization. Here is a quick example that I did and see right away that object pooling degrade the performance.

It took 620 ms in average without object pulling.
It took 860 ms in average with object pulling.

public class NoPool {
	
	long count = 0;
	
	public void run(){
		for(int i=0; i<100000; i++){
			SomeObject o = new SomeObject(i, "Creation : " + i);
			count += o.getI();
		}
	}
	
	public static void main(String [] args){
		NoPool np = new NoPool();
		long begin = System.nanoTime();
		for(int i=0; i<100; i++){
			np.run();
		}
		long end = System.nanoTime();
		long delta = end-begin;
		System.out.println("Time elapsed in nano : " + delta);
		System.out.println("Time elapse in milli : " + delta/(1000*1000));
	}

}
public class ObjectPool {
	
	private SomeObject[] pool = new SomeObject[100000];
	long count = 0;
	
	public ObjectPool(){
		for(int i=0; i<100000; i++){
			pool[i] = new SomeObject(0, "");
		}
	}
	
	public void run(){
		for(int i=0; i<100000; i++){
			SomeObject o = pool[i];
			o.setI(i);
			o.setS("Creation : " + i);
			count += o.getI();
		}
	}
	
	public static void main(String [] args){
		ObjectPool op = new ObjectPool();
		long begin = System.nanoTime();
		for(int i=0; i<100; i++){
			op.run();
		}
		long end = System.nanoTime();
		long delta = end-begin;
		System.out.println("Time elapsed in nano : " + delta);
		System.out.println("Time elapse in milli : " + delta/(1000*1000));
	}

}
public class SomeObject {
	
	private int i;
	private String s;
	
	public SomeObject(int i, String s){
		this.i = i;
		this.s = s;
	}
	
	public int getI(){
		return i;
	}
	
	public String getS(){
		return s;
	}
	
	public void setI(int i){
		this.i = i;
	}
	
	public void setS(String s){
		this.s = s;
	}

}

You confuse throughput with latency.