Cuckoo hashing code[important update**2]

delt0r · October 10, 2010, 10:01am

Cuckoo hashing is not so new method of implementing hash tables. It has some very nice properties providing the load factor is not too high. Since objects are by reference, the load factor of the array is not a memory issue directly. As such its well suited to java.

In particular is has worst case O(1) performance as opposed to O(n) for chaining and other hash table implementations. Anyway i have been asked to make my updated versions available (I have already made previous versions available). This has been used in production code for some time now. So it should be pretty stable.

Its also always faster than a HashMap for all my test cases.

However comments are for wimps and communists

Warning
I do break the collections contract also. All methods that return a collection, are not backed by the original collection. That is they are copies.
This is no longer true. They now conform completely to the Map contract.

the attachment has all the source for the package…and perhaps a little extra just to make sure that I didn’t miss anything. This is used in this project: http://www.mabs.at/ewing/msms/index.shtml

Note that for these classes you can choose between BSD, Apache or LGPL with classpath exception for a license.

Comments and suggestions are always welcome. Note i lurk in #lwjgl on IRC a lot too.

Update
Ok so I found out that some of the hashing code was pretty brain dead. This is now fixed. Also added is a junit test for almost everything in cuckoo.*. Not that i found any bugs other than poor performance from bad hashing.

Update2
For some reason the attachment is not working. I a, re-attaching. If that does not work, try this:
www.mabs.at/ewing/cuckoo.jar

Update3
Well i fixed the brain dead hashing with brain dead bugs. BIG bugs that totally ruined all performance. Its now fixed. Sorry. Attachment now also fixed properly.

Just synced the comments with the code. :persecutioncomplex:

gouessej · October 10, 2010, 10:25am

Hi!

Ok therefore I can comment.

delt0r · October 10, 2010, 1:36pm

Just to be clear – i meant that the code is not well commented. Probably good enough however.

Nate · October 11, 2010, 1:02am

Micro benchmarks updated:
http://www.java-gaming.org/index.php/topic,21395.msg191504.html#msg191504
Are the benchmarks reasonable? The source is there, can just import the project into Eclipse and run.

CommanderKeith · October 11, 2010, 1:45am

I use delt0r’s cuckoo hashing code in my A* path-finding code and found that it is the fastest map implementation AND produces no garbage unlike the other map implementations which make an HashMap.Entry for every object they store.

Thanks for the code

Nate · October 11, 2010, 5:43am

CommanderKeith, good point on the garbage creation. How did you measure the maps? I guess you just measured your A* implementation? Any ideas why the benchmarks don’t mimic real usage?

delt0r · October 11, 2010, 6:47am

No benchmarks are reasonable, but some are interesting. : Seriously i had problems with my initial tests because of quirky benchmarks.

I am not really surprised with its rank in the benchmarks however. First most of my objects cache 3 fields for the objects stored in the map, this makes it quite a bit faster (the CuckooHashFields interface). And two: usage patterns really really matter. For example if you end up having to rehash a lot then i would expect it to degrade to the default hash map implementation. I would also expect a good performance gain by having a oversize initial capacity.

In my applications you get a lot of add/put and remove but mostly contains. The size stays approximately the same throught the maps life time in my app.

As awlays… YMMV.

I should also point out that i use java.util.* implementations until i see a performance problem.

Edit: PS my hashing is not very optimized. There is perhaps a good performance gain there too.

Roquen · October 11, 2010, 9:06am

A very interesting variant from the original version is to add ‘b’ bins. So for each hash index there are ‘b’ linear slots associated with it.

(edit: since I’m sure that was as clear as mud, a survey-like paper is here: http://www.ru.is/faculty/ulfar/CuckooHash.pdf

jezek2 · October 11, 2010, 10:25am

I’ve read about cuckoo before and it’s indeed a nice algorithm. How well (or bad) it handles a degenerative case of all keys having the same hash code (as returned from the hashCode() method)? Does it have maximum of “number_of_hashes * bin_size” entries in such case? Expanding the array size and rehashing would end up with the same maximum amount of possible entries, right? Or have I overlooked something?

delt0r · October 11, 2010, 10:41am

cuckoo hashing will fail with many identical hash codes as i have implemented it. However this is a pretty easy situation to avoid generally*. It can also be avoided internally. But is not a usage case i need to consider.

For the security aware this does have implications when you don’t control the type of objects inserted into the map. Someone could mount a DNS attack by sending objects that have the same hash code.

CommanderKeith · October 11, 2010, 10:45am

One thing I just remembered, when I changed these lines:

int hash =System.identityHashCode(e);

to:

int hash =e.hashCode();

It went much faster. I only discovered this after profiling. Delt0r’s map was way faster than the java.util maps in my case.

Nate · October 11, 2010, 11:36am

Re-did the charts with server VM and way more iterations. Wish we had a good way of generically comparing hashmaps in a few different ways, but I guess people will just have to experiment in their actual apps.

delt0r · October 11, 2010, 1:33pm

[quote]…but I guess people will just have to experiment in their actual apps.
[/quote]
It really is the only way IMO.

delt0r · October 12, 2010, 7:27pm

I have updating the release. See the first post. This time i include the class files as well and a junit test. Please update if you downloaded the old version.

CommanderKeith · October 13, 2010, 1:39am

Thanks for the update delt0r. But the jar seems like it’s missing the actual cuchoo map code.

Cheers,
keith

Roquen · October 13, 2010, 8:37am

WRT: hashCode values from wrapped primitives, I perform a decorrelation step directly after calling hashCode. Mine is:


   // odd scaled approximation of: (3-Sqrt[5])/2
   private static final int weyl = 0x61c88647;

    ...

    int hash  = key.hashCode();
    int w     = 0;
    int h0;

    hash ^= hash<<10; hash ^= hash>>>15; 
    hash ^= hash<<4;  hash ^= hash>>>13;
    w     += weyl;
    h0    = (hash + w) & mask;  // actual hash-0 used

delt0r · October 13, 2010, 9:00am

Sorry. Now the jar is correct.

I did try similar things Roquen, but found that a simple hash*PRIME+PRIME worked as good as any and was a little faster. The randomness properties where also just as good.

CommanderKeith · October 13, 2010, 9:56am

Thanks.

Btw another thing that made the code a bit quicker was to use the == operator instead of equals.

Riven · October 13, 2010, 10:00am

for primitives you can use ==, but in a Map, you really must use equals or you wouldn’t be able to get a value if you didn’t have that exact reference lying around, which you rarely have.

lhkbob · October 13, 2010, 2:50pm

Unless you want an identity map, which I find very useful.