minimizing garbage collection

Riven · July 21, 2007, 12:12am

duplicates the most-significant-bit

inserts zeroes.

-1 >>> 1 = 2 billion

keldon85 · July 21, 2007, 12:32am

Yes, well the way to get around that (providing you want to divide) is to write “-(-b>>1)” when b is negative.

emzic · December 29, 2007, 5:05pm

i just read through this whole thread again, but i still do not fully understand.

so will object-pooling increase performance when i need thousands of new Vector3fs in each gameloop?

DzzD · December 29, 2007, 6:25pm

basically because it is costless to get an existing object, and it is cpu expansive to allocate (release when GC) memory for new object (even for one byte), when creating a new Object you ask the target OS RAM manager or java memory manager to give you an heap space, this is slow in comparaison of getting an existing memory allocated place, so using a pool for object this is what you do by getting a “vector” to an already allocated memory area. this is why it it will be always faster in any language to use pooled objects.

emzic · December 29, 2007, 9:10pm

thanks for the info.
so short answer is yes, in this situation pooling will be better?

Riven · December 29, 2007, 9:44pm

As always with performance, it depends on your implementation.

Just implement it, then tweak it, then compare it to creating new objects, then pick the fastest.

DzzD · December 29, 2007, 10:25pm

[quote]As always with performance, it depends on your implementation.
[/quote]
this a general good way, and you may try it.

but, in this special case, there will be no surprise, pooling will be faster especially for a lot of objects. Memory managment works exacly the same as hard disk managment, including fragmentation and such, so even if Java GC run it best and Java memory manager is excellent and underlying OS memory manager is excellent, it cant be faster to allocate object than reusing already existing object. reusing object is near to be cpu costless, it only imply reading an object reference wich is near to read a “pointer”, maybe only 2/4 cpu cycles.

If you have time, make a simple test case (a simple loop allocating object and doing some computation on them and do the same with prealocated objects) an printout both bench result, I guess you will find a huge difference.

thinking of the general rule of cpu use 80/20, that basically explain that most of the source code 80% doesn’t help a lot in being optimised as most cpu is used in 20% of your code, you dont have to care of pooling if it is outside of the 20% code that use most of your cpu.

you can use -xprof option to identify the code that use the most of your cpu and se if you really need optimising this part by pooling, low level optimisation should be used carefully and must be done as later as possible in your project.

emzic · January 2, 2008, 4:54pm

ok, thanks a lot everybody for your information.
you have convinced me to give it a try, which leads me to the next question: how to implement object pooling?

especially how do i find out, if an object isnt needed anymore? (referencecount == 0)

bienator · January 2, 2008, 6:05pm

Don’t reimplement the GC! If you want fast pooling keep it simple as possible. Put objects into the pool if you are sure that there are not in use any more. Anything else will run slower than regular new/GC cycles.

I use pools for the storage of Triangle objects for terrain triangulation. Without the pool I get regular full (stop the world) GCs every 6 seconds. With the pool enabled the GC is triggered concurrently without stops and enables smooth rendering (60+ fps). And don’t use pools for a small amount of objects (my initial pool size is around 1 million triangles).

emzic · January 2, 2008, 7:15pm

yeah, but how do i know that?

and what is a good datastructure for object pools? java.util.HashSet ?

thanks!

Riven · January 2, 2008, 7:18pm

Not tested, not compiled:


public interface Supply<T>
{
   public T create();
}


public class Pool<T>
{
   private final Supply supply;
   private final int max;
   private final List<T> cache;

   public Pool(Supply supply, int max)
   {
      this.supply = supply;
      this.max = max;
      this.cache = new ArrayList<T>();
   }

   public final T grab()
   {
      if(cache.isEmpty())
         return supply.create();
      return cache.remove(cache.size() - 1);
   }

   public void dump(T t)
   {
      if(cache.size() < max)
         cache.add(t);
   }
}

Riven · January 2, 2008, 7:26pm


Vec3 a = new Vec3(1,2,3);
Vec3 b = new Vec3(3,2,1);
Vec3 tmp = pool.grab();

Vec3.cross(a, b, tmp);
float val = a.dot(tmp);

// we can be sure here that 'tmp' will not be used anymore
pool.dump(tmp); 

// it will only break seriously when either cross() or dot() store the reference
// of 'tmp' somewhere, but we can reasonably assume that's not happening

emzic · January 2, 2008, 7:39pm

thank you riven!

so that means, i will need to figure out myself when an object isnt needed anymore. (ok that was obvious).

actually that is quite a shock, since i am doing vector math all over the place in my engine and i will need to insert the pool.dump(tmp) everywhere…

hmmm… is it still worth the effort?

Riven · January 2, 2008, 7:40pm

only pool objects where it matters, so that doesn’t mean you should do it everywhere.

just pool the bottlenecks.

emzic · January 2, 2008, 7:49pm

ok thanks again for all your help. i’m now off to implementing it, then testing it, then i will report here with the results.

Riven · January 2, 2008, 8:15pm

Keep in mind that the provided code is NOT threadsafe.

Do NOT access the same Pool from more than 1 thread. Never.

Synchronizing the methods will pretty much destroy your preformance-gains.

emzic · January 5, 2008, 4:41pm

ok, according to my profiler i got garbage collection down a lot! the minor GCs happen now only every 3 seconds as opposed to 3 times a second before. if that has an impact on the overall performance i do not know. probably on slower machines yes.

Abuse · January 15, 2008, 8:19pm

Tbh, I wouldn’t introduce object pooling at the source level at all.
You are compromising the design integrity of your source-code to accomodate for a performance limitation of the current breed of VMs.
What do you do when the next VM comes along, and your object pooling turns out to now be the performance bottleneck?

A bytecode engineering solution to compliament the capabilities of the VM’s compiler would be a much cleaner, reusable & more scalable solution.

While it isn’t a trivial problem to solve, it isn’t beyond the realms of imagination (No doubt it would be borrowing many aspects from the miriad of optimising compilers that already exist)

Riven · January 15, 2008, 10:04pm

I totally agree with you, but…

I want performance now… it makes my code 2-3x faster, in a few minutes for refactorying my ‘ideal’ sourcecode.

I don’t have the time to build that bytecode-transformer. Keep in mind that such a transformer would be almost impossible to get right, as the developer knows when the Object is ready to reuse, yet the transformer cannot analyse that. Or you’d be building yet another GC…

abies · January 27, 2008, 7:56pm

There is object pooling and object pooling. Not every case is just about saving gc, sometimes it is about saving memory. ‘Pooling’ immutable objects has a nice side effect that you won’t end up with millions of instances of the same (same like in ‘equals returning true’) object in jvm. After all, java.lang.Integer.valueOf(int) implements a small pool itself, so it cannot be THAT bad, can it ?

As far as claiming that gc will solve all your problems - it is not exactly true. If you generate a LOT of immediate garbage, you will invoke gc pauses more often. In every gc, some part of life objects will be copied here and there (at least before they mature enough to hit old generation) - which is costly operation. So, don’t sacrifice your app logic for gc, but also don’t allocate things just because they are ‘free’.

I’m doing a lot of performance sensitive code these days and when you hit 8+GB heaps and cannot afford more than 50ms pauses and cannot use NewParallelGC (because it crashes 100% with our app within 4 hours), one because bit more careful about garbage allocation.