Cuckoo hashing code[important update**2]

Cuckoo hashing is not so new method of implementing hash tables. It has some very nice properties providing the load factor is not too high. Since objects are by reference, the load factor of the array is not a memory issue directly. As such its well suited to java.

In particular is has worst case O(1) performance as opposed to O(n) for chaining and other hash table implementations. Anyway i have been asked to make my updated versions available (I have already made previous versions available). This has been used in production code for some time now. So it should be pretty stable.

Its also always faster than a HashMap for all my test cases.

However comments are for wimps and communists :wink:

Warning
I do break the collections contract also. All methods that return a collection, are not backed by the original collection. That is they are copies.

This is no longer true. They now conform completely to the Map contract.

the attachment has all the source for the package…and perhaps a little extra just to make sure that I didn’t miss anything. This is used in this project: http://www.mabs.at/ewing/msms/index.shtml

Note that for these classes you can choose between BSD, Apache or LGPL with classpath exception for a license.

Comments and suggestions are always welcome. Note i lurk in #lwjgl on IRC a lot too.

Update
Ok so I found out that some of the hashing code was pretty brain dead. This is now fixed. Also added is a junit test for almost everything in cuckoo.*. Not that i found any bugs other than poor performance from bad hashing.

Update2
For some reason the attachment is not working. I a, re-attaching. If that does not work, try this:
www.mabs.at/ewing/cuckoo.jar

Update3
Well i fixed the brain dead hashing with brain dead bugs. BIG bugs that totally ruined all performance. Its now fixed. Sorry. Attachment now also fixed properly.

Just synced the comments with the code. :persecutioncomplex:

Hi!

Ok therefore I can comment.

Just to be clear – i meant that the code is not well commented. Probably good enough however.

Micro benchmarks updated:
http://www.java-gaming.org/index.php/topic,21395.msg191504.html#msg191504
Are the benchmarks reasonable? The source is there, can just import the project into Eclipse and run.

I use delt0r’s cuckoo hashing code in my A* path-finding code and found that it is the fastest map implementation AND produces no garbage unlike the other map implementations which make an HashMap.Entry for every object they store.

Thanks for the code :slight_smile:

CommanderKeith, good point on the garbage creation. How did you measure the maps? I guess you just measured your A* implementation? Any ideas why the benchmarks don’t mimic real usage?

No benchmarks are reasonable, but some are interesting. ::slight_smile: Seriously i had problems with my initial tests because of quirky benchmarks.

I am not really surprised with its rank in the benchmarks however. First most of my objects cache 3 fields for the objects stored in the map, this makes it quite a bit faster (the CuckooHashFields interface). And two: usage patterns really really matter. For example if you end up having to rehash a lot then i would expect it to degrade to the default hash map implementation. I would also expect a good performance gain by having a oversize initial capacity.

In my applications you get a lot of add/put and remove but mostly contains. The size stays approximately the same throught the maps life time in my app.

As awlays… YMMV.

I should also point out that i use java.util.* implementations until i see a performance problem.

Edit: PS my hashing is not very optimized. There is perhaps a good performance gain there too.

A very interesting variant from the original version is to add ‘b’ bins. So for each hash index there are ‘b’ linear slots associated with it.

(edit: since I’m sure that was as clear as mud, a survey-like paper is here: http://www.ru.is/faculty/ulfar/CuckooHash.pdf

I’ve read about cuckoo before and it’s indeed a nice algorithm. How well (or bad) it handles a degenerative case of all keys having the same hash code (as returned from the hashCode() method)? Does it have maximum of “number_of_hashes * bin_size” entries in such case? Expanding the array size and rehashing would end up with the same maximum amount of possible entries, right? Or have I overlooked something? :slight_smile:

cuckoo hashing will fail with many identical hash codes as i have implemented it. However this is a pretty easy situation to avoid generally*. It can also be avoided internally. But is not a usage case i need to consider.

For the security aware this does have implications when you don’t control the type of objects inserted into the map. Someone could mount a DNS attack by sending objects that have the same hash code.

One thing I just remembered, when I changed these lines:

int hash =System.identityHashCode(e);

to:

int hash =e.hashCode();

It went much faster. I only discovered this after profiling. Delt0r’s map was way faster than the java.util maps in my case.

Re-did the charts with server VM and way more iterations. Wish we had a good way of generically comparing hashmaps in a few different ways, but I guess people will just have to experiment in their actual apps.

[quote]…but I guess people will just have to experiment in their actual apps.
[/quote]
It really is the only way IMO.

I have updating the release. See the first post. This time i include the class files as well and a junit test. Please update if you downloaded the old version.

Thanks for the update delt0r. But the jar seems like it’s missing the actual cuchoo map code.

Cheers,
keith

WRT: hashCode values from wrapped primitives, I perform a decorrelation step directly after calling hashCode. Mine is:


   // odd scaled approximation of: (3-Sqrt[5])/2
   private static final int weyl = 0x61c88647;

    ...

    int hash  = key.hashCode();
    int w     = 0;
    int h0;

    hash ^= hash<<10; hash ^= hash>>>15; 
    hash ^= hash<<4;  hash ^= hash>>>13;
    w     += weyl;
    h0    = (hash + w) & mask;  // actual hash-0 used


Sorry. Now the jar is correct.

I did try similar things Roquen, but found that a simple hash*PRIME+PRIME worked as good as any and was a little faster. The randomness properties where also just as good.

Thanks.

Btw another thing that made the code a bit quicker was to use the == operator instead of equals.

for primitives you can use ==, but in a Map, you really must use equals or you wouldn’t be able to get a value if you didn’t have that exact reference lying around, which you rarely have.

Unless you want an identity map, which I find very useful.