Cuckoo hashing code[important update**2]

Hmm, good point. I’ll fix it up, thanks!

Ok, cool. What do you think about my hash for the int map? The first hash I just use the int, the 2nd and 3rd I use your hashes. Does this 1st hash have any significant drawbacks?

Finished my cuckoo classes. Updated benchmarks:

MapGet2
MapPut2
MapIterate2
IntMapGet2
IntMapPut2
IntMapIterate2

I’m annoyed my gets are not as fast as delt0r’s. :slight_smile: Here is my code:

public V get (int key) {
	if (key == 0) return zeroValue;
	int index = key & mask;
	if (keyTable[index] != key) {
		index = hash2(key);
		if (keyTable[index] != key) {
			index = hash3(key);
			if (keyTable[index] != key) return getStash(key);
		}
	}
	return valueTable[index];
}

private V getStash (int key) {
	int[] keyTable = this.keyTable;
	for (int i = capacity, n = i + stashSize; i < n; i++)
		if (keyTable[i] == key) return valueTable[i];
	return null;
}

Here is delt0r’s code:

public V get(int key) {
	int hash =hash(key);
	hash &=mask;
	if (key == keys[hash])
		return entrys[hash];
	int hash2 =hash2(key);
	hash2 &=mask;
	if (key == keys[hash2])
		return entrys[hash2];
	int hash3 =hash3(key);
	hash3 &=mask;
	if (key == keys[hash3])
		return entrys[hash3];
	return null;
}

My code takes almost twice as long, even when getStash is not actually called! I can replace “return getStash(key);” with “throw new RuntimeException(“blah!”);” and it doesn’t change the speed. If I change “return getStash(key);” to “return null;” then it runs the same speed as delt0r’s code. I guess it can’t be optimized the same way? Seems crazy.

I’m using the CIdentityHashSet and to reduce System.identityHashCode() in my use-case I made every class of object that I was adding to the set over-ride the hashCode() method to the following:

int hash = System.identityHashCode(this);
public int hashCode(){
	return hash;
}

So the System.identityHashCode() method is called only once and is cached when the object is created. I then replaced all CIdentityHashSet calls to System.identityHashCode(object) with object.hashCode() and this gained some speed.

Btw I use CIdentityHashSet mostly for its fast contains(object) method and noticed that it doesn’t do lazy calculation which gains some speed, so here it is for what it’s worth:

public boolean contains(Object e) {
	int code=e.hashCode();//System.identityHashCode(e);
	int hash = hash(code);
	hash =hash & mask;
	if (table[hash] == e) {
		return true;
	}
	int hash2 = hash2(code);
	hash2 =hash2 & mask;
	if (table[hash2] == e) {
		return true;
	}
	int hash3 = hash3(code);
	hash3 =hash3 & mask;
	if (table[hash3] == e) {
		return true;
	}
	return false;
}

I know a few posts here got deleted due to the JGO outage but I read most of them I think. Nate raised the point that my program shouldn’t depend so much on map/set performance. For some reason it does, A* path finding does lots of contains() checks in the open and closed lists which I use the CIdentityHashSet for.

Maybe try java 7 and see if it makes any difference since it looks like a weird hotspot quirk that might be flattened out with of java 7’s more aggressive optimisations.

Thanks for the updates guys. Love those fancy graphs 8)

[quote]Nate raised the point that my program shouldn’t depend so much on map/set performance. For some reason it does, A* path finding does lots of contains() checks in the open and closed lists which I use the CIdentityHashSet for.
[/quote]
I recently was using A* for a small map and used an int[] for the closed list and a Node[] and binary heap for the open list. To keep from allocating or clearing the int[] each time, I increment an int “id” and check “closed[y * width + x] != id”. Each node also has an ID, and a similar check tells if the node is being encountered for the first time this run.

[quote]Maybe try java 7 and see if it makes any difference since it looks like a weird hotspot quirk that might be flattened out with of java 7’s more aggressive optimisations.
[/quote]
Good idea, I’ll give it a shot.

[quote]I recently was using A* for a small map and used an int[] for the closed list and a Node[] and binary heap for the open list. To keep from allocating or clearing the int[] each time, I increment an int “id” and check “closed[y * width + x] != id”. Each node also has an ID, and a similar check tells if the node is being encountered for the first time this run.
[/quote]
Ah wow, that is a good idea! No need for sets/maps in the A* algorithm by letting the node track its own open/closed/unprocessed status. I tried it out and it sped up the algorithm lots. Clever thinking. Thanks Nate 8)

I’m still using the maps/sets for other things so I’ll keep track of your great work guys.
Cheers,
keith