GC problem with HashMap put() and clear()

CommanderKeith · October 20, 2009, 2:00pm

Hi,

I’ve got a garbage collection problem relating to java.util.HashMap. After profiling, I’ve found that the garbage collector (GC) is using around 5% of the processor time. The problem is that I create lots of HashMap.Entry’s since I’m always adding to HashMaps then calling clear() to empty the HashMap again, and repeating.

I’m using the HashMaps in the A* path finding code I’ve been working on. I use HashMaps instead of ArrayLists to achieve constant-time get() and add()/put() operations which has sped up the algorithms, but contributed to GC time. That’s because with every put(), a new HashMap.Entry is created.

What should I do to avoid the huge amount of HashMap.Entry object creation? I’m 99% certain that the HashMap.Entry creation is the problem since NetBeans profiler says that I have 248 ‘live’ instances of HashMap.Entry lying around, but I’ve allocated 58,000 instances over the program’s life. Those 248 live instances comprise 6,000 bytes which is 6% of the ‘live’ bytes.

Any thoughts appreciated
Thanks,
Keith

bleb · October 20, 2009, 2:06pm

Write your own HashMap that pools and reuses entry objects?

CommanderKeith · October 20, 2009, 2:15pm

Good idea! Thanks. I should have thought of that

I googled ‘HashMap that pools entries’ and found this which looks perfect:
http://javolution.org/target/site/apidocs/javolution/util/FastMap.html

[quote]FastMap are reusable; they maintain an internal pool of Map.Entry objects. When an entry is removed from a map, it is automatically restored to its pool (unless the map is shared in which case the removed entry is candidate for garbage collection as it cannot be safely recycled).
[/quote]
Awesome 8) I’ll try it out and report back

CommanderKeith · October 20, 2009, 3:21pm

Thanks a lot Bleb!

javaolution.util.FastMap fixed the problem, no more excessive allocations of HashMap.Entry.

The only garbage left now is from java2D (which is surprisingly quite a lot!)

JL235 · October 20, 2009, 5:03pm

As an alternative I’d like to put forward my own CachingHashMap available as a part of my game library with the source code here (the library is still yet to be completed).

Mine is just a HashMap that caches it’s entry nodes when you clear it and remove items. As such if you fill it with a million entries and then remove them all, it still has a million entry nodes stored internally. The advantage is that if you add another million then rather then making new entry nodes it’ll re-use all the entry nodes from the last million. This avoids the cost of creating objects (which is where the performance is from) and means that internally it’s dereferencing no objects for the GC to clean up (to clarify; it does not hang on to a reference to the items you add, just to it’s internal entry nodes).

In terms of CPU performance, using my own very cheap and brief benchmarks I find the FastMap to be marginally faster then then the HashMap but marginally slower then my CachingHashMap.

Jono · October 20, 2009, 9:34pm

[quote=“CommanderKeith,post:1,topic:34406”]
My only thought is that there shouldn’t be ArrayLists in an A* implementation in the first place. The algorithm requires a priority queue (there’s a heap-based PriorityQueue class in java.util that would work), and a hash map can be used in conjunction to speed up contains() checks on the queue.

If you don’t expect duplicates in the open list to occur very often then you don’t need a contains() check (the duplicates will just order themselves in the queue) and the hash map can be left out.

Riven · October 20, 2009, 10:41pm

That’s why he switched… 8)

Demonpants · October 20, 2009, 11:11pm

Are you talking about using HashMap’s etc. for the open and closed lists in A*? If not, which lists are you talking about?

CommanderKeith · October 20, 2009, 11:45pm

JL235:

As an alternative I’d like to put forward my own CachingHashMap available as a part of my game library with the source code here (the library is still yet to be completed).

Mine is just a HashMap that caches it’s entry nodes when you clear it and remove items. As such if you fill it with a million entries and then remove them all, it still has a million entry nodes stored internally. The advantage is that if you add another million then rather then making new entry nodes it’ll re-use all the entry nodes from the last million. This avoids the cost of creating objects (which is where the performance is from) and means that internally it’s dereferencing no objects for the GC to clean up (to clarify; it does not hang on to a reference to the items you add, just to it’s internal entry nodes).

In terms of CPU performance, using my own very cheap and brief benchmarks I find the FastMap to be marginally faster then then the HashMap but marginally slower then my CachingHashMap.

Thanks JL235, that will be fantastic if i can use your CachingHashMap implementation. The Javolution FastMap requires me to use the whole 0.4MB package so your single file is much better. The only benefit that Javolution provides is that add() is constant time since the underlying array is never copied when the hashmap outgrows its capacity, but I have no use for that since my map is always re-used so it will always stay big. Thanks 8) Out of interest, what was your reason for making the CachingHashMap? Where did you need it?

Hi Jono, thanks for your comments. Yes I use a BinaryHeap priority queue for the openList, but to eliminate use of List.contains() I also use a HashMap for the open and closed lists.

Jono · October 21, 2009, 12:17am

[quote=“CommanderKeith,post:9,topic:34406”]
Ah, just a misunderstanding from your original post then. I had the impression that you had replaced an ArrayList with a Hash Map for the open list, which is obviously not the case ;D

Nate · October 21, 2009, 12:26am

I’d like to use the Javolution collections and text stuff in some of my projects. I wish the JAR wasn’t 400+ kb! The classes aren’t easily extracted either.

Nate · October 21, 2009, 4:53am

Having never done anything with A*, this may very well be a stupid suggestion, but here goes anyway. Can you get away with using IdentityHashMap? I’m seeing it about 30% faster than HashMap on gets. Obviously this doesn’t solve your GC problems, but maybe you could modify JL235’s class to use == or base a new one off IdentityHashMap.

I started a related thread with some benchmarks:
http://www.java-gaming.org/index.php/topic,21395.0.html

JL235 · October 21, 2009, 6:46am

If you can predict how big it will be, then you could set an initial capacity to avoid the resizing of the array.

[quote=“CommanderKeith,post:9,topic:34406”]
For exactly this problem that your having. I use a hashmap in my collisions grid and found if I had only 1,000 enemies in there, all moving and testing for collisions, then the JVM would be spending about 5% of it’s time running the GC in order to clean up the millions of objects passively created. Now it’s only spending about 0.5%.

Bear in mind that this is released under a BSD license, and so it does need to be properly credited if you are going to be using it. Otherwise go right ahead and take it.

jezek2 · October 21, 2009, 7:16am

There is also GNU Trove, it has faster hashmaps, supports primitive types and it doesn’t need to allocate Entry objects when traversing the map by using methods such as forEachEntry.

You can use ProGuard for eliminating unneeded classes in your whole application/libraries, also you can obfuscate them (which will shrink them by some amount too), but it can be set to not obfuscate if not wanted.

Nate · October 21, 2009, 8:32am

Looking at their source, the classes can’t be easily extracted without bringing ~80% of all the classes along for the ride.

CommanderKeith · October 21, 2009, 1:47pm

Cool idea, I’ll try that and report back. Thanks!