minimizing garbage collection

DzzD · December 29, 2007, 10:25pm

[quote]As always with performance, it depends on your implementation.
[/quote]
this a general good way, and you may try it.

but, in this special case, there will be no surprise, pooling will be faster especially for a lot of objects. Memory managment works exacly the same as hard disk managment, including fragmentation and such, so even if Java GC run it best and Java memory manager is excellent and underlying OS memory manager is excellent, it cant be faster to allocate object than reusing already existing object. reusing object is near to be cpu costless, it only imply reading an object reference wich is near to read a “pointer”, maybe only 2/4 cpu cycles.

If you have time, make a simple test case (a simple loop allocating object and doing some computation on them and do the same with prealocated objects) an printout both bench result, I guess you will find a huge difference.

thinking of the general rule of cpu use 80/20, that basically explain that most of the source code 80% doesn’t help a lot in being optimised as most cpu is used in 20% of your code, you dont have to care of pooling if it is outside of the 20% code that use most of your cpu.

you can use -xprof option to identify the code that use the most of your cpu and se if you really need optimising this part by pooling, low level optimisation should be used carefully and must be done as later as possible in your project.

emzic · January 2, 2008, 4:54pm

ok, thanks a lot everybody for your information.
you have convinced me to give it a try, which leads me to the next question: how to implement object pooling?

especially how do i find out, if an object isnt needed anymore? (referencecount == 0)

bienator · January 2, 2008, 6:05pm

Don’t reimplement the GC! If you want fast pooling keep it simple as possible. Put objects into the pool if you are sure that there are not in use any more. Anything else will run slower than regular new/GC cycles.

I use pools for the storage of Triangle objects for terrain triangulation. Without the pool I get regular full (stop the world) GCs every 6 seconds. With the pool enabled the GC is triggered concurrently without stops and enables smooth rendering (60+ fps). And don’t use pools for a small amount of objects (my initial pool size is around 1 million triangles).

emzic · January 2, 2008, 7:15pm

yeah, but how do i know that?

and what is a good datastructure for object pools? java.util.HashSet ?

thanks!

Riven · January 2, 2008, 7:18pm

Not tested, not compiled:


public interface Supply<T>
{
   public T create();
}


public class Pool<T>
{
   private final Supply supply;
   private final int max;
   private final List<T> cache;

   public Pool(Supply supply, int max)
   {
      this.supply = supply;
      this.max = max;
      this.cache = new ArrayList<T>();
   }

   public final T grab()
   {
      if(cache.isEmpty())
         return supply.create();
      return cache.remove(cache.size() - 1);
   }

   public void dump(T t)
   {
      if(cache.size() < max)
         cache.add(t);
   }
}

Riven · January 2, 2008, 7:26pm


Vec3 a = new Vec3(1,2,3);
Vec3 b = new Vec3(3,2,1);
Vec3 tmp = pool.grab();

Vec3.cross(a, b, tmp);
float val = a.dot(tmp);

// we can be sure here that 'tmp' will not be used anymore
pool.dump(tmp); 

// it will only break seriously when either cross() or dot() store the reference
// of 'tmp' somewhere, but we can reasonably assume that's not happening

emzic · January 2, 2008, 7:39pm

thank you riven!

so that means, i will need to figure out myself when an object isnt needed anymore. (ok that was obvious).

actually that is quite a shock, since i am doing vector math all over the place in my engine and i will need to insert the pool.dump(tmp) everywhere…

hmmm… is it still worth the effort?

Riven · January 2, 2008, 7:40pm

only pool objects where it matters, so that doesn’t mean you should do it everywhere.

just pool the bottlenecks.

emzic · January 2, 2008, 7:49pm

ok thanks again for all your help. i’m now off to implementing it, then testing it, then i will report here with the results.

Riven · January 2, 2008, 8:15pm

Keep in mind that the provided code is NOT threadsafe.

Do NOT access the same Pool from more than 1 thread. Never.

Synchronizing the methods will pretty much destroy your preformance-gains.

emzic · January 5, 2008, 4:41pm

ok, according to my profiler i got garbage collection down a lot! the minor GCs happen now only every 3 seconds as opposed to 3 times a second before. if that has an impact on the overall performance i do not know. probably on slower machines yes.

Abuse · January 15, 2008, 8:19pm

Tbh, I wouldn’t introduce object pooling at the source level at all.
You are compromising the design integrity of your source-code to accomodate for a performance limitation of the current breed of VMs.
What do you do when the next VM comes along, and your object pooling turns out to now be the performance bottleneck?

A bytecode engineering solution to compliament the capabilities of the VM’s compiler would be a much cleaner, reusable & more scalable solution.

While it isn’t a trivial problem to solve, it isn’t beyond the realms of imagination (No doubt it would be borrowing many aspects from the miriad of optimising compilers that already exist)

Riven · January 15, 2008, 10:04pm

I totally agree with you, but…

I want performance now… it makes my code 2-3x faster, in a few minutes for refactorying my ‘ideal’ sourcecode.

I don’t have the time to build that bytecode-transformer. Keep in mind that such a transformer would be almost impossible to get right, as the developer knows when the Object is ready to reuse, yet the transformer cannot analyse that. Or you’d be building yet another GC…

abies · January 27, 2008, 7:56pm

There is object pooling and object pooling. Not every case is just about saving gc, sometimes it is about saving memory. ‘Pooling’ immutable objects has a nice side effect that you won’t end up with millions of instances of the same (same like in ‘equals returning true’) object in jvm. After all, java.lang.Integer.valueOf(int) implements a small pool itself, so it cannot be THAT bad, can it ?

As far as claiming that gc will solve all your problems - it is not exactly true. If you generate a LOT of immediate garbage, you will invoke gc pauses more often. In every gc, some part of life objects will be copied here and there (at least before they mature enough to hit old generation) - which is costly operation. So, don’t sacrifice your app logic for gc, but also don’t allocate things just because they are ‘free’.

I’m doing a lot of performance sensitive code these days and when you hit 8+GB heaps and cannot afford more than 50ms pauses and cannot use NewParallelGC (because it crashes 100% with our app within 4 hours), one because bit more careful about garbage allocation.

bienator · January 27, 2008, 9:06pm

abies:

There is object pooling and object pooling. Not every case is just about saving gc, sometimes it is about saving memory. ‘Pooling’ immutable objects has a nice side effect that you won’t end up with millions of instances of the same (same like in ‘equals returning true’) object in jvm. After all, java.lang.Integer.valueOf(int) implements a small pool itself, so it cannot be THAT bad, can it ?

As far as claiming that gc will solve all your problems - it is not exactly true. If you generate a LOT of immediate garbage, you will invoke gc pauses more often. In every gc, some part of life objects will be copied here and there (at least before they mature enough to hit old generation) - which is costly operation. So, don’t sacrifice your app logic for gc, but also don’t allocate things just because they are ‘free’.

100% agreed. [offtopic] Immutable objects have even another cool side effect you need 0 synchronisation if you work with multiple threads. Scala uses even immutable HashMaps…[/offtopic]

8GB Heap and 50ms GC pauses? Thats awesome! I never thought that the GC would scale so good. My engine uses currently around 1 gig of RAMfull of small objects and is at a point where even parallel young GCs take >100ms (and full GCs would take > 2 seconds without pooling). Now I decided to move from dynamic resizing pools to static pre-allocated pools that fixed that problem (0 allocation or deallocations, yeha!).

But good to know that there is still room left ;D

(Am I the only one who noticed that every VM/GC performance white paper tries to advice against pooling?)

abies · January 27, 2008, 9:35pm

Of course we use CMS, with only 48MB new generation, rest being done periodically in background by CMS.

emzic · February 17, 2008, 10:13am

may i ask, what is CMS?

bienator · February 17, 2008, 3:29pm

this is the new Concurrent Mark and Sweep Collector introduced with Java SE 6. It uses all available cores to clean the young generation and tries to do most of the work in the tenured generation concurrently (-> while your app is runnig).

Its primary aim is to prevent the evil full stops (also known as full GCs).

great overview over all Garbage Collectors:
http://blogs.sun.com/jonthecollector/entry/our_collectors

there is also a new concurrent GC planed for Java 7 called Garbage First I have aggregated a interesting discussion on my blog:
http://www.michael-bien.com/roller/mbien/entry/garbage_first_the_new_concurrent