GC Lag Question

Hi,

In my experience the garbage collector is the most painful problem when making good games since it kills percieved performance when a full GC kicks in and adds a delay between the updating of the game world and its display on the screen. So what is the best way to handle game objects like bullets to minimise full GC’s but keep design object-oriented?

Bullets can hang around in the game world for any length of time, often so long that they are promoted out of GC-eden space and into the long-lived space. So what should be done with them to reduce full GC’s? I see a couple of alternatives, and I rank them in order of preference:

  1. Object pooling - recycle old bullets by having factory methods to ‘create’ new ones.
  2. Adjusting the eden space proportion (I don’t know how, I just read that it can be done)
  3. depart from OO design and have a static Bullet class that has arrays of primitives that represent all bullets’ primitives

So how have you handled these problems or is GC lag just accepted?

Keith

PS: I am using time-based updating and I know that GC lag is fine if it happens after showing the game world but before updating takes place, however it cannot be compensated for in the frame about to be displayed when it delays that frame’s rendering >> leads to a percieved lag.

You must be shooting zillions of bullets then. Or really ‘heavy’ bullets for that matter. Both are unlikely, and besides that, bullets are pretty short-lived (unless you keep them laying in the world). Do you really think many MBs of bullet-data are residing on the heap? (literally 100.000s of bullets)

I can’t help but think something else in your code is creating a lot of garbage. If you stop shooting bullets and the lag vanishes, you should inspect the code that is creating those bullets, not the bullets themselves, maybe it can be optimized.

If all else fails, sacrify design for performance and either pool (least bad design) or create a big array of bullets (enabled/disabled) which would be the worst design, but probably fastest.

Keep in mind that enlarging the eden-heap will also increase non-full GC times.

See Titan Attacks and Ultratron. Do you see any GC lag?

Cas :slight_smile:

You’re probably right, but sometimes I create 300+ bullets per frame (at 60fps) and they have lots of fields (21, mostly floats and other objects). hmmmm… but I suppose that not a zillion :)…

When you guys have GC problems how do you find out the cause? I’ve done a heap profile in netbeans but it told me that Java2D was creating most of the garbage, but that garbage can’t have been promoted out of the eden.

How can you tell what is causing the GC lags? I have a 1% GC-tick count as it is (unnacceptable) and it can get up to 4%.

I definitley want to learn more about java heap memory usage and the GC but it took me ages just to find out what -verbosegc output actually tells you on windows.

Have you read this http://java.sun.com/docs/hotspot/gc5.0/gc_tuning_5.html ?

In particular, try the concurrent low-pause collector… with incremental mode and automatic ‘pacing’

http://java.sun.com/docs/hotspot/gc5.0/gc_tuning_5.html#5.4.9.1

Thanks for that, its mitigated the problem. I tried the parallel GC before but not with those fancy add-ons.

It was funny though, the prescribed options have a typo:

-XX:+UseConcMarkSweepGC
-XX:+CMSIncrementalMode
-XX:+CMSIncrementalPacing
-XX:CMSIncrementalDutyCycleMin=0
-XX:+CMSIncrementalDutyCycle=10 this one doesn’t work with the ‘+’.
-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps
-XX:-TraceClassUnloading

Keith

no lags there, but these guys seem to use lwjgl and not jogl.

btw: great games!

I shouldn’t think LWJGL or JOGL are any different in terms of GC activity…

For comparison have a look at Alien Flux: I used extensive particle pooling trickery and pools for enemy bullets and lasers, figuring I’d be generating hundreds of these kinds of things all the time. And in truth it is fractionally smoother than Ultratron/Titan but you have to be very, very sharp to spot it, and the code complexity is huge compared to just creating new Particle()s and forgetting about them when they’re done which is what I switched to in the minigames.

Anyway: the gist of it is, GC lag is very unlikely to actually be GC lag, it’s probably something else, and even if it’s not, it can be tuned.

Cas :slight_smile:

One common cause of apparent GC errors is filling the heap with an object leak that causes it to be constantly GCing whats left.

Have you run a profiler and looked at your actual heap?

One common cause of apparent GC errors is filling the heap with an object leak that causes it to be constantly GCing whats left.

No it’s not that, the heap stays small (< 30M).

Maybe the GC tick-count is just a combination of many long-lived bullets and Java2D rendering garbage.

Just out of interest, why can’t Sun add some methods to manipulate when the GC does its work? After all, the GC is just a thread.

System.gc() exists but that suggests to the VM to do a full gc, which it seems to actually do every time it’s called whether it’s neded or not (and it takes ages). A method like System.edenGC() to prompt the VM to do a young generation GC (if necessary) would be great.

I’m with CAS on that one.
“Life fast, die young” should be the motto of your objects. The JVM is much more capable in pooling objects than you will ever be so don’t waste time even thinking about it.

[quote=“CommanderKeith,post:10,topic:27121”]
Have you profiled the GC? The Eden GC doesn’t take enough time to matter.
There is really no point in continuing the conversation if you haven’t got the actual stats to show what the GC is really doing and when. Without that info you are just shooting in the dark.
Have you used the -XX:+PrintGCDetails (or whatever it is) switch?

Start here:
http://java.sun.com/docs/hotspot/index.html

So the answer to “why aren’t there more GC controls in the API” is that a decision was made early on to leave the GC implementors maximum room for different implementations of GCs.

The eden, for instance, is a very specific part of a very specific implementation.

Had we tied the GC to an APi that exposed how it works then you’ld be stuck today with the first one we had (the mark/sweep collector.)

Frankly, even exposing System.gc(), with what VERY limited gauranatees it makes, has basically proved to be a mistake as the results of calling it vary greatly between GC implementations and programs that depend on it to do anything in particular lose portability.

It would, nonetheless, be probably a boon to programmatically tune the GC from within an application. At the very least it would be exceedingly useful to be able to set -Xmx and -Xms dynamically at runtime.

Cas :slight_smile:

[quote=“swpalmer,post:12,topic:27121”]

Maybe this will help with CK argument. I have modified ny little sprite rotation benchmark. Now it takes into account the extra time wasted by sleep and logging and discounts that time in the next cycle. I have also removed all System.out.printlnS and im using an array to log data instead. The output is like this:

Im using a frame of 0.002 milliseconds and if recovery time is bigger than this the frame will be skipped.

Im also using these arguments:

-XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode -XX:+CMSIncrementalPacing -XX:CMSIncrementalDutyCycleMin=0
-XX:CMSIncrementalDutyCycle=10 -verbose:gc -Dsun.java2d.opengl=true

But didn’t notice much difference.


[GC 1944K(16320K), 0.0032607 secs]
[GC 2023K(16320K), 0.0107422 secs]
0.000001601    0.000399223    0.002916319    0.001322865    0.001322865
0.000003711    0.001932507    0.000000000    0.000000000    0.001277139
...
0.000001113    0.000311164    0.001685209    0.000001868    0.000001868
0.000001075    0.000341113    0.001654918    0.000001973    0.000001973
[ParNew 4082K->1119K(16320K), 0.0142923 secs]
0.000000828    0.000317662    0.001678342    0.000001593    0.000001593
0.000000811    0.000305137    0.001691493    0.000001820    0.000001820
...
0.000001278    0.000315372    0.001680717    0.000001825    0.000001825
0.000000931    0.000308691    0.001687813    0.000001891    0.000001891
[GC 2941K(16320K), 0.0037765 secs]
0.000001058    0.000301988    0.001694364    0.000001819    0.000001819
0.000000905    0.000301778    0.001694927    0.000001945    0.000001945
...
0.000000913    0.000306197    0.001690615    0.000001895    0.000001895
0.000000983    0.000532326    0.001465397    0.000003360    0.000003360
[ParNew 5087K->1119K(16320K), 0.0014446 secs]
0.000001531    0.000507845    0.001483932    0.000002016    0.000002016
0.000001368    0.000347703    0.001647204    0.000001417    0.000001417
...
0.000001137    0.001043897    0.000952418    0.000001829    0.000001829
0.000000838    0.005141818    0.000000000    0.000000000    0.003145566
*skip*
0.000001326    0.000000675    0.000000000    0.000000000    0.001149968
0.000000638    0.001004171    0.000000000    0.000000000    0.000155542
0.000000845    0.003107058    0.000000000    0.000000000    0.001265168
0.000001378    0.001169552    0.000000000    0.000000000    0.000437526
0.000001116    0.001032476    0.000529242    0.000002045    0.000002045
0.000001196    0.007081311    0.000000000    0.000000000    0.005086058
*skip*
0.000001396    0.000000653    0.000000000    0.000000000    0.003090326
*skip*
0.000000625    0.000000648    0.000000000    0.000000000    0.001092314
0.000000638    0.002204311    0.000000000    0.000000000    0.001297968
0.000001341    0.011736162    0.000000000    0.000000000    0.011036995
*skip*
0.000002246    0.000001111    0.000000000    0.000000000    0.009042558
*skip*
0.000000633    0.000000620    0.000000000    0.000000000    0.007044587
*skip*
0.000000628    0.000000615    0.000000000    0.000000000    0.005046540
*skip*
0.000000631    0.000000613    0.000000000    0.000000000    0.003048524
*skip*
0.000000626    0.000000612    0.000000000    0.000000000    0.001050502
0.000000631    0.001758613    0.000000000    0.000000000    0.000810491
...
0.000001151    0.000354055    0.001644016    0.000002201    0.000002201
0.000001553    0.000408814    0.001587720    0.000001886    0.000001886
[ParNew 5087K->1120K(16320K), 0.0011402 secs]
0.000000956    0.000299241    0.001698633    0.000001689    0.000001689
0.000000710    0.000268572    0.001729585    0.000001676    0.000001676
...
0.000000710    0.000254531    0.001771521    0.000029520    0.000029520
0.000001108    0.000316977    0.001652275    0.000001916    0.000001916
[ParNew 5088K->1121K(16320K), 0.0015111 secs]
0.000000835    0.000261114    0.001737151    0.000001967    0.000001967
0.000000698    0.000291981    0.001705902    0.000001509    0.000001509
...

It looks like GC usualy takes a very short time in the order of 1 millisecond to do his job and i suspect that with lwjgl and jogl this would be much better.

There is however the ocasional situation where, without any reasonable explanation gc takes an absurd amount of time. Look at the second gc output above. Or when java2d is performance so well rendering frames in less than a millisecond and sudenly it starts chooking. But that only happened once in that group of skips in the example above.

Anyway i think java2d performs quite well but lwjgl/jogl probably performs a lot better and gc won’t be a problem in there. I think the ocasional hickups we will notice in very rare ocasions are more because of the way java2d works internaly than because of gc.

And whats your/the conclusion?
Sorry if I am a bit harsh, but I wonder about the sence of this discussion :wink:

lg Clemens

That’s exactly what i thought… not being able to render 1000fps flawlessly is out of reality.
Being able to have a 60fps game that does not stutter is a goal that can be achieved easily now, and imho that’s enough for most of us.
Moreover, with the generalization to come of multi-core processors, asynchronous GC will be more and more imperceptible.
Even i, who does applications that have very precise and short timings to comply to when displaying images is not bothered (that much) by GC.

Of course, having smaller/sliced full GCs would complete the set, as nothing would stop renders anymore, but i think we can already be pretty happy with what we have since 1.4.

Thanks for everybody’s interest & help.

Have you profiled the GC? The Eden GC doesn’t take enough time to matter.

Yes, Eden GC’s are quick but even on my 2GHz machine -verbosegc shows they take 0.0005 seconds (1/2 a millisecond). There is no question that this is noticeable when it delays a frame getting to the screen (zingbat’s 1 millisecond lag would be very noticeable). It would not be an issue if the eden GC could be told when to work so that its lag could be adjusted for. In my opinion this is a rock-bottom basic thing that java should have.

And Jeff, more efficient GC’s are not what is needed -> predictable GC’s are the answer and I’m sure some VM vendor will realise this soon. Also, multi-core processors won’t necessarily solve the problem since the GC thread could still be scheduled on the core running the rendering.

Noticing 0.5ms, and a 1.0ms delay would be very noticable?

Do you realize that even 10ms result in 100fps. Even if that would cause a frame-miss, you’d still be rendering at 50fps.

My only worries are about 50+ ms GCs, only those will decrease perceived performance in a game.

I admit that 1/2 a millisecond is hard to spot. This tests what lags are noticeable. It replicates GC lags every 3/4 second, delaying the current frame from being displayed and beeps when it does this:

import java.awt.;
import javax.swing.
;
import java.awt.image.;
import java.awt.geom.
;

public class LagTest extends JPanel{

float gcLagInMillis = 5;


long lastRenderNanos = 0;
float counterMillis = 0;
float speed = 0.08f;		// pixels per millisecond
float xPos = 0;

public LagTest(){
	lastRenderNanos = System.nanoTime();
}
public void render(Graphics2D g2D){
	g2D.setRenderingHint(RenderingHints.KEY_ANTIALIASING, RenderingHints.VALUE_ANTIALIAS_ON);
	long currentNanos = System.nanoTime();
	int nanosElapsed = (int)(currentNanos - lastRenderNanos);
	float millisElapsed = (float)(nanosElapsed/1000000f);
	counterMillis += millisElapsed;
	xPos += speed*millisElapsed;
	if (xPos > getWidth()){
		System.exit(0);
	}
	g2D.setColor(Color.white);
	g2D.fillRect(0, 0, getWidth(), getHeight());
	g2D.setColor(Color.black);
	g2D.setStroke(new BasicStroke(3));
	g2D.drawLine(Math.round(xPos), Math.round(getHeight()/2),
				Math.round(xPos)+10, Math.round(getHeight()/2));
	lastRenderNanos = currentNanos;

	//replicate GC lag every 3/4 second
	if (gcLagInMillis != 0){
		if (counterMillis > 750){
			Toolkit.getDefaultToolkit().beep();
			long nanos = System.nanoTime();
			while(true){
				float millisWaited = (float)((System.nanoTime() - nanos)/1000000f);
				if (millisWaited > gcLagInMillis){
					counterMillis = 0;
					break;
				}
			}
		}
	}
}


public static void main(String[] arghhImAPirate){
	JFrame frame = new JFrame("LagTest");

	frame.setIgnoreRepaint(true);

    frame.setExtendedState(JFrame.MAXIMIZED_BOTH);
    frame.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);

	final LagTest lagTest = new LagTest();
    frame.setContentPane(lagTest);

	frame.setVisible(true);

	frame.createBufferStrategy(2);
	final BufferStrategy strategy = frame.getBufferStrategy();


	Thread thread = new Thread(new Runnable(){
		public void run(){
			while (true){
				Graphics2D g2D = (Graphics2D)strategy.getDrawGraphics();
				lagTest.render(g2D);
				strategy.show();
				g2D.dispose();
				try{
					Thread.sleep(5);
				}catch(InterruptedException e){
				}
			}
		}
	});
	thread.start();
}

}

EDIT:
change the Thread.sleep(5); call to something bigger like 40 milliseconds and see what a big difference this makes to how noticeable the lag is. This might show that gc lag is worse when the frame rate is low. That must be because the delayed frame hangs around for longer and the eye picks up on it. Unfortunately the lower the frame rate, the more Java2D rendering that is probably being done, and the more garbage that is being created!