String deduplication coming in Java8U20 in the G1 GC


Just found this link on reddit.
Basically, it’s a new GC improvement in Java8 U20 where the GC compares strings’ char array’s contents and if they contain the same thing it’ll update the references to point to a single char array and removes all the other arrays. Smart stuff I say, and while it comes with a bit of a performance overhead I can see this being useful for lots of applications. Nice thing is that this is (at least for now) completely optional and to access it you have to be using the G1 GC with the [icode]-XX:+UseG1GC[/icode] switch, so the performance overhead of this feature won’t be applied to your application/game if you don’t want it to.

Any thoughts on this? :slight_smile:

Well, it’s a great idea, as a tuning option, if you identify this as an area that could do with optimisation in your application.

Cas :slight_smile:

Uggh. An aside on the linked interning strings. Think carefully before doing something like this…perm memory consumption.

The Java 8 VM did away with PermGen.

Unrelated. interned strings are off-heap and aren’t subject to GC. If fact interned string or at least the language requirement of string identity of strings found in a class file hurt the JVM and should be eliminated.

Interned strings are on heap since Java 7 and yes they are garbage collected

It seems like they were toying with java.lang.String for years.

First they add regex methods to String, without making it obvious that the parameters they expect should be regex-patterns in Java 1.4:
String.split(…), String.replaceAll(…)
which is, to this day, the cause of many bugs.

In a Java 7 minor update, they do away with Strings sharing backing char[]s, leading to String.substring(…) to become one of the slowest operations in the Java standard library, while it once was very lightweight, making it both a performance drain and a memory hog.

In a Java 8 minor update they add this daemon thread/service that plows through all Strings on the heap, looking for exact duplicates. Didn’t we have enough stutter already? How common (apart from pathological benchmarks) are these long lived (!) duplicate Strings anyway. If they are short lived, the StringDeduplicater won’t find them before they go out of scope. Which applications do hold large multitudes of long lived, massive Strings? Seriously… I’m drawing a blank.

Yea I feel like if you have something which would benefit noticeably from this you were doing it wrong anyway.

Reference? I’m not seeing any change here: http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/687fd7c7986d/src/share/classes/java/lang/String.java

But my point is that interning strings is almost always a bad idea.

FTW. You have a reference? Hopefully it can be disabled.

So far I’ve not seen any convincing arguments in favour of the G1 garbage collector being that much better than the default CMS collector for real time games (in fact some here have mentioned that CMS is still better).

It seems a lot of these optimisations are aimed at server side applications rather than real world desktop applications.

Incremental CMS seems to work best on average…and it’s going away:

http://openjdk.java.net/jeps/173

I get about half performance when I switch to G1. Grrr.

Cas :slight_smile:

Gosh it looks like they want to kill off a number of combos:

http://openjdk.java.net/jeps/8044022

this is a propose and not a JEP yet.

so you’re advising against that and also against G1

What VM parameters do you guys use ?

I did switch to 8u20 because I still had 7… for maybe a performance boost, which I won’t notice, but hey.

I had been using -Xincgc… switching to G1GC my framerates plummet from 60fps to 30fps. Awkward.

Cas :slight_smile:

What kind of effort have you done at tuning?

Note the compiler of hotspot will be more or less the same for supported versions…just some compile time switches will differ. Not quite true, but more or less.

www.oracle.com/technetwork/java/javase/jdk7-relnotes-418459.html

[quote]. Synopsis: In JDK 7, interned strings are no longer allocated in the permanent generation of the Java heap, but are instead allocated in the main part of the Java heap (known as the young and old generations), along with the other objects created by the application. This change will result in more data residing in the main Java heap, and less data in the permanent generation, and thus may require heap sizes to be adjusted. Most applications will see only relatively small differences in heap usage due to this change, but larger applications that load many classes or make heavy use of the String.intern() method will see more significant differences.
RFE: 6962931
[/quote]

Tons… sat there with jvisualvm for literally days finding out what’s slow and what’s not etc. G1GC for some reason just does really badly. Pauselessly, though, eh? :wink:

Cas :slight_smile:

If you’re not getting anything useful from jvisualvm, I recommend the jmc flight recorder.