Garbage collector tuning

Right…

Still the performance is worse with object pooling.

I would need a way to calculate the pause of the GC for each. Since there is a big difference in the performance of the 2, you might still be surprise.

EDIT: I found it -> The command line argument -verbose:gc prints information at every collection.

Very good point.

The only way object creation/deletion is ‘free’ is if it’s escaped, the type is statically known to be a specific concrete type (don’t know if hotspot attempts this) and scalar replacement/evolution can determine all fields are written before reads…and probably some other stuff I’m forgetting and are very unlikely to be determined to be true.

Actually, if this wasn’t an issue we’d need to ask ourselves why there are always new presentations being given on GC tuning (including from Sun and now Oracle).

[quote]In general, a particular generation sizing chooses a trade-off between these considerations. For example, a very large young generation may maximize throughput, but does so at the expense of footprint, promptness, and pause times. young generation pauses can be minimized by using a small young generation at the expense of throughput. To a first approximation, the sizing of one generation does not affect the collection frequency and pause times for another generation.
[/quote]
From : http://www.oracle.com/technetwork/java/gc-tuning-5-138395.html

I’m still reading :slight_smile:

And yet we have more surprises! :slight_smile:

I run my previous test with the vm argument : -verbose:gc

Result with object pooling :

[GC 49216K->10758K(188352K), 0.0076902 secs]
[GC 59974K->12750K(237568K), 0.0070728 secs]
[GC 111182K->14302K(237568K), 0.0070655 secs]
[GC 112734K->15478K(336000K), 0.0067169 secs]
[GC 212342K->16430K(336000K), 0.0067652 secs]
[GC 213294K->17246K(535872K), 0.0063414 secs]
[GC 410974K->17214K(535872K), 0.0072485 secs]
[GC 410942K->17246K(929728K), 0.0069668 secs]
Time elapsed in nano : 918742224
Time elapse in milli : 918

Result without object pooling

[GC 49216K->256K(188352K), 0.0007276 secs]
[GC 49472K->192K(188352K), 0.0005208 secs]
[GC 49408K->256K(188352K), 0.0004196 secs]
[GC 49472K->160K(237568K), 0.0003678 secs]
[GC 98592K->208K(237568K), 0.0004676 secs]
[GC 98640K->224K(328192K), 0.0005211 secs]
[GC 197088K->176K(328192K), 0.0008081 secs]
[GC 197040K->176K(320064K), 0.0004403 secs]
[GC 189168K->176K(312896K), 0.0002397 secs]
[GC 181680K->176K(305472K), 0.0003220 secs]
[GC 174576K->176K(299008K), 0.0002982 secs]
Time elapsed in nano : 654859105
Time elapse in milli : 654

Conclusion : The pause with object pooling are usually 10 to 20 times longer and the throughput is a lot worse!

How about fixing the obvious flaws in your benchmark first?

“text”+n means 4 allocations right there (StringBuilder & char[], String & char[])

Besides, synthetic benchmarks rarely showcase the realworld problems of garbage collector induced latency. In your case eden is simply cleared, and as there is no graph of references, no references have to be rewritten, and no memory has to be moved.

That benchmark is entirely incorrect.

Cas :slight_smile:

:frowning: How should I write a good benchmark?

What’s the point? You’re not going to disproof any of our realworld experiences anyway. :point:

Well, only test what you’re supposed to be testing for a start - Riven beat me to the punch though; your benchmark creates tons of garbage entirely outside what it is you are trying to test for a start.

Cas :slight_smile:

It’s nice to realize that that code is so slow :slight_smile: When you remove it the speed of the benchmark is increased by 50 times :slight_smile: Also, the GC doesn’t have to collect anything anymore. But object pooling is still slower.

An object pool is not free either! Which value of ‘not free’ is better comes down a lot to circumstances and evaluation of need.

With apologies for derailing a thread on the merits of bad benchmarks! :stuck_out_tongue:

Yes indeed. However I’m not the one claiming anything is free. The only plus point I gave to pooling is (if properly set-up) it allows data-flow optimizations. In general which is ‘better’ is: it depends.

I find it hard to believe that you could write an object pool that is slower than doing a new.
Also, the GC will eventually have to collect when you’re doing new, and it is the unpredictable nature of “eventually” and the duration of said collection that pooling solves.

Cas :slight_smile:

I don’t! It depends what changes (if any) you need to make to the object to make it suitable for pooling. Some objects that might have been immutable now have to be mutable, which could add cost (and if you’re working with threads at all, then immutable objects can be a very good thing).

I personally tend to think of pooling for memory purposes rather than objects (ie. int[] / buffers for pixel data). However, in that scenario you either end up with using lots of extra memory, or a need to check dimensions - at some point the size you want is probably so small it’s faster to create from scratch.

I didn’t even tried :frowning: I just wrote it and got those results…

Also, the GC is collecting the objects (as you see when I post the output in the console). One thing that I have to agree is the unpredictable nature when you don’t have object pooling. With object pooling the pause are usually between 60 and 80 ms for my example. Without object pooling, the pause can be as low as 2ms or as high as 24ms (12 times bigger than lowest). But still the pauses are much shorter.

Never draw any conclusions from a flawed benchmark.

From Java 6 documentation

[quote]5. Available Collectors
The discussion to this point has been about the serial collector. The Java HotSpot VM includes three different collectors, each with different performance characteristics.

  1. The serial collector uses a single thread to perform all garbage collection work, which makes it relatively efficient since there is no communication overhead between threads. It is best-suited to single processor machines, since it cannot take advantage of multiprocessor hardware, although it can be useful on multiprocessors for applications with small data sets (up to approximately 100MB). The serial collector is selected by default on certain hardware and operating system configurations, or can be explicitly enabled with the option -XX:+UseSerialGC.

  2. The parallel collector (also known as the throughput collector) performs minor collections in parallel, which can significantly reduce garbage collection overhead. It is intended for applications with medium- to large-sized data sets that are run on multiprocessor or multi-threaded hardware. The parallel collector is selected by default on certain hardware and operating system configurations, or can be explicitly enabled with the option -XX:+UseParallelGC.

New: parallel compaction is a feature introduced in J2SE 5.0 update 6 and enhanced in Java SE 6 that allows the parallel collector to perform major collections in parallel. Without parallel compaction, major collections are performed using a single thread, which can significantly limit scalability. Parallel compaction is enabled by adding the option -XX:+UseParallelOldGC to the command line.

  1. The concurrent collector performs most of its work concurrently (i.e., while the application is still running) to keep garbage collection pauses short. It is designed for applications with medium- to large-sized data sets for which response time is more important than overall throughput, since the techniques used to minimize pauses can reduce application performance. The concurrent collector is enabled with the option -XX:+UseConcMarkSweepGC.
    [/quote]
    And there is one new garbage collector in Java 7 : G1

[quote]The Garbage-First (G1) garbage collector is fully supported in Oracle JDK 7 update 4 and later releases. The G1 collector is a server-style garbage collector, targeted for multi-processor machines with large memories. It meets garbage collection (GC) pause time goals with high probability, while achieving high throughput. Whole-heap operations, such as global marking, are performed concurrently with the application threads. This prevents interruptions proportional to heap or live-data size.
[/quote]

Anyone knows if the JVM can now do that thing or not?

[quote]The JIT compiler can perform additional optimizations that can reduce the cost of object allocation to zero. Consider the code in Listing 2, where the getPosition() method creates a temporary object to hold the coordinates of a point, and the calling method uses the Point object briefly and then discards it. The JIT will likely inline the call to getPosition() and, using a technique called escape analysis, can recognize that no reference to the Point object leaves the doSomething() method. Knowing this, the JIT can then allocate the object on the stack instead of the heap or, even better, optimize the allocation away completely and simply hoist the fields of the Point into registers. While the current Sun JVMs do not yet perform this optimization, future JVMs probably will. The fact that allocation can get even cheaper in the future, with no changes to your code, is just one more reason not to compromise the correctness or maintainability of your program for the sake of avoiding a few extra allocations.
[/quote]
EDIT : It seems it does : Java 7 Enhancements

JDK 7 has escape analysis.