Included in the jar is a set and map implementations both with identity versions as well.
Bumping so that folks realize that there has been an important bug fix. In particular performance is now restored to what it should be. Also i have added a int keyed map as well. Its quite a bit faster than using Integers.
Perhaps the last big fix? I have now changed the implementation to return proper backed versions of collections. This results in much faster iteration, and has conformance to the Map interface contract. Many of the Collections.* methods use these methods and this results in expected behavior and better performance.
I’m messing around with cuckoo hashing. I think I might have found an issue with your implementation. Does the following correctly describe what happens with your implementation?
[quote]put key1 in a, b, or c
key1 goes in a
put key2 in a, x, or z
key2 goes in x
remove key1
put key2 in a, x, or z
key2 goes in a
key2 is in the table twice!
[/quote]
Edit:
It seems you increment size in some places without checking >= threshold. Is this ok?
I think the capacity specified in the constructor should be used as “initialCapacity * (1 / loadFactor)”. This avoids rehashing when the user knows how many items they want to insert (especially for small capacities).
Hi Greg,
Thanks for the updates. I tried the new CIdentityHashSet you posted in the new jar file attached to the first post.
I found it’s a bit slow because it uses System.identityHashCode(e) rather than e.hashCode().
After fixing that, I found that it still ran slower than the old version and I looked into the difference and found that you didn’t do any maths on the hashCode in the first version:
int hash = e.hashCode();
int hash2 = 1 + ((hash * PRIME) + (hash >>> 16));
int hash3 = 2 + (hash ^ (hash >>> 16)) * PRIME;
But in the new version:
int code=e.hashCode();
int hash =hash(code);
int hash2 = hash2(code);
int hash3 =hash3(code);
So I replaced the ‘int hash =hash(code);’ with ‘int hash = e.hashCode();’ and it was back to normal again. Are these acceptable performance adjustments?
Cheers
keith
I’ve had better luck using multiple bins instead of single bin implementations. The “sweet” point seeming to be either 2 or 4 bins per masked hash. My goodness metric was density and not performance. YMMV.
There are advantages and disadvatages of using identityHashCode. Advantage: It is (at least used to be) the raw address (maybe salted?) of the object. So you’ll get tended to get objects that are allocated together nearby in the hash table. This may or may not match a given use-case. Disadvantage: It has to go across the native boundary and perform memory management fix-up, which can lead to requiring more memory. This will be the case for objects that just fit within the allocators bucket and the extra storage (orginal memory address) pushes the size into a larger chunk. Additonally, is doesn’t have the notion of equivalent objects returning the same hash value. Finally, as noted, it tends to require more cycles.
Rehashing the inital hash returned by the object. This is another use-case dependent decision.
identityHashCode: rehashing the intial in effect negates any reason to use this call. So the first used hash should be something like: hash0 = (hash >>> 4)|(hash<< 28), if used.
hashCode: Notiably the integer wrapper types return their represented value as the hash. In the general case this is of little use. However it seems rather common than sequences are used. In this case not rehashing the intial is reasonable. The map implementations shipped with Java do bit mix the results of hashCode, this is more important there than for cuckoo since only a single hash value is used and must be ‘good’ for the general case. The additonal hash code(s) cover this case for cuckoo.
CommanderKeith you must use System.ideneitiyHashCode() because objects can and should override hashCode() to be consistent with .equals(). ie Integer. Also (new Integer(1)).hashCode()==1 and creates pathological cases very easily. So you really must hash the returned Object.hashCode().
Yes its a little slower. But it should be correct in all cases. Getting to the wrong place faster is not really faster.
Nate: I will double check. This should be caught in the unit tests and I did have it correct at one point. Perhaps doing the “lazy hashing” borked it. Shows the importance of code review
There are lots of different ways to implement this. I found that the cost of hashing is small compared to cache issues and Object.equals() and Object.hashCode() for the general case. So i went with 3 hash codes so load factors can go higher than 0.9 . The default is 0.75 and setting it to .9 will mean that puts that result in a rehash will be slower, but once re sized iterator performance will be faster. In my tests on 64bit and 32bit JVM on linux, having a single table was a little faster, I decided due to cache issues. But “fast” on JVM is a moving target with its own optimizations.
In practice I have custom CHashMap with custom Hash Fields. This is much faster than any of the above. But fitting the right Map to the job has always been part of optimization.
Nate was right. Added something to the unit tests and fixed the code. This is getting silly. A good reason to use java.util.* unless you really need something faster.
This also illustrates why units tests are not the panacea of good code. They only test for what you think of.
I should note that for my IntHashMap the first hash is highly biased so that linear usage of ints results in fast access. This will hit the load factor only very slightly since the next 2 hash codes are close to random.
It’s true that I’m ignoring “black box” situations such as placing mixed types in the map. I’m making the assumption of “this is not a general purpose map that can used as a black box”. For instance in:
Byte b1 = new Byte((byte)1);
Integer i1 = new Integer(1);
Both b1 & i1 return the same hash and performing equals will return false and therefore will cause the map to explode if you don’t use identityHashCode. Likewise will be true for any pair of objects which return identical hash values and return false for equals.
However, this isn’t much of a limitation for an embedded method for a wide class of problems.
I like to write test cases which cover anything I can think of and then have one or two which use a seeded random number generator to create a couple of thousand reproducible tests cases which might cover things I didn’t think of. For example,
@Test
public void randomTests() {
Random rnd = new Random(0x12345678);
for (int i = 0; i < 1000; i++) {
HashMap<Integer, Integer> expected = new HashMap<Integer, Integer>();
CHashMap<Integer, Integer> observed = new CHashMap<Integer, Integer>();
int numOps = 20 + rnd.nextInt(100);
for (int j = 0; j < numOps; j++) {
int key = rnd.nextInt(32);
int op = rnd.nextInt(3);
switch (op) {
case 0:
case 1: // Insert
int val = rnd.nextInt();
expected.put(key, val);
observed.put(key, val);
break;
case 2: // Remove
expected.remove(key);
observed.remove(key);
break;
default: // Bodged the code somewhere
Fail("Switch and nextInt mismatch");
break;
}
AssertEqual(expected, observed);
}
}
}
Using an identity hashmap for wrapper classes together with auto boxing is a guarantee for failure. As long as your keys are within the cached range it looks like it will work. But once you use keys outside the cached range (mostly 0…255) you get different objects for the same integer.
So if you use identity hashcode you must use ==. Then you can’t use auto boxing.
The problem with code that’s not a black box, is that someone will use it as a black box. Especially if it implements java.util.Map. I generally don’t extend from that when i want special behavior. Note i do randomize my tests as well.
This code is intended to work as a drop in replacement for HashMap or IdenetityHashMap. As such you must make consistent use of Object.equals(), Object.hashCode(), System.idenetityHashCode() and ==. There is no way to have correct code otherwise.
Cuckoo hashing does have a limitation that means its not a true drop in for any chaining HashMap. In particular you cannot add 3 different objects that return the same hash code. It will rehash till you run out of ram.
I’m just throwing out general comments about various possible implementations. Take them for what they’re worth. Personally I have no problem with creating codebases with strict contracts that if violated will be “boom”.
CommanderKeith said
No. The problem is that == and equals means different things. Just as Object.hashCode() and identityHashCode() do.
Example:
Integer one=new Integer(1);
Integer moreOne=new Integer(1);
assert one.hashCode()==moreOne.hashCode();
assert one.equals(moreOne);
assert System.idenetityHashCode(one)!=System.identityHashCode(moreOne);
assert one!=moreOne;
Line 7 will return false perhaps once every 2^32 iterations. Generally you want Object.equals(). But for the odd occasion you need “identity” for example using reflection for a deep clone, or with serialization.
Falling back from one method or the other is even worse. Its makes the semantics (meaning of hashMap) random.
The infinite allocation problem is not solved by using a different hash. I have a single number that must deterministically produce 3 pesudo random numbers. If the original number is the same then all three hash functions will always produce the same 3 values. This is compounded by the fact that java Number class wrappers use stupid hash values. They should at least contain “type” in the hash as well (ie Short.hashCode() should not give the same result as Long.hashCode() for the same value).
Nate said:
[quote]To avoid the infinite allocation, a stash could be used. This has a small performance hit for gets. I have an implementation I’ll post in the “map performance” thread (soon) so we don’t hijack d3ltor’s thread.
If you control the objects, you could implement hashCode and cache the identityHashCode. But, why is your application so sensitive to map performance!?
[/quote]
Yes this would. While giving worst case performance of O(n). In which case perhaps chaining or linear probing is the better choice.
btw you are not hijacking anything.
Did you mean this is response to using a stash? The required stash size is only log(n) (see the paper linked below), however even then it is empty almost all the time, has only 1 to ~3 items when it isn’t empty, and is only checked when the key was not found first in the 3 hashes. I’m pretty convinced a stash is the way to go. It has little overhead and handles pathological cases, making cuckoo robust.
Okie, I’ll post my cuckoo stuff here then. This can be the cuckoo thread and that other thread can be the chart thread. Actually, the chart thread is long, messy, has a lot of OT, and is in the Android section… maybe I’ll make a new fresh one.
I guess there’s no fixing the hash functions, n-way collisions will always be an issue. I tried 4 hashes and it happened less often, but still bothers me. Stash to the rescue!
I have implemented cuckoo maps (3 hash, random walk) with and without a stash. Implementing the stash was very little code. Benchmarks:
These benchmarks reuse the map instance so put can be more accurately measured, which means the benchmarks don’t reflect the time it takes to rehash (and of course memory usage is not shown). I’ve run a few standalone tests with millions and millions of puts, and the stash prevents the map from ever rehashing due to 3-way hash collisions or loops. The required stash size is only log(n), according to this paper.
To answer my earlier question, I found that max(16, sqrt(n) / 4) for the max number of push iterations for a single put is a reasonable value to minimize the number of iterations while also minimizing the stash usage. This seems to scale well. Without the stash, it seems max(32, sqrt(n) / 2) is a good value to not cause a rehash most of the time.
I think I’ll go with the stash version as it is more robust without much penalty. Next I’ll implement the rest of the map methods (delete, etc) and an object key version, and then I’ll run new benchmarks with all the maps.
Oops, forgot the link to my source:
http://n4te.com/temp/fast.7z
I am suspicious of claims of a O(ln(n)) stash size, my guess that the proof requires strong assumptions about the hash functions and all the hash codes begin “truly” random, in this case chaining and linear probing is also O(ln(n)). At any rate that is now a treeMap, not a hash map. All my tests are based on 1Million entries or more.
For tables of size 16 or so, I doubt hashMaps would be quicker than just iterating.
Well, the paper is there. There is some stuff about results from Braverman that proves “polylog(n)-wise independent hash functions are sufficient for” cuckoo hash maps using a stash/queue. I’m not really sure what that means though.
Possibly better, play with my Cuckoo3Stash class. I’m using a log(n) stash size and I am not seeing it exceeded in my tests. The stash is often not even used, and when it is only has a few items in it. Here is my class for ease of browsing or copy pasting into an IDE (yay JGO syntax highlighting!):
public class Cuckoo3Stash<V> {
static private final int EMPTY = Integer.MIN_VALUE;
static private final int PRIME2 = 0xbe1f14b1;
static private final int PRIME3 = 0xced1c241;
public int size;
int[] keyTable;
V[] valueTable;
int capacity, stashSize;
private float loadFactor;
private int hashShift, mask, threshold;
private int stashCapacity;
private int pushIterations;
private Entries entries;
private Values values;
private Keys keys;
public Cuckoo3Stash (int initialCapacity, float loadFactor) {
if (initialCapacity > 1 << 30) throw new IllegalArgumentException("initialCapacity is too large.");
if (initialCapacity < 0) throw new IllegalArgumentException("initialCapacity must be greater than zero.");
if (loadFactor <= 0) throw new IllegalArgumentException("initialCapacity must be greater than zero.");
this.loadFactor = loadFactor;
capacity = nextPowerOfTwo((int)(initialCapacity * (1 / loadFactor)));
threshold = (int)(capacity * loadFactor);
mask = capacity - 1;
hashShift = 31 - Integer.numberOfTrailingZeros(capacity);
stashCapacity = Math.max(3, (int)Math.ceil(Math.log(capacity)) + 1);
pushIterations = Math.max(16, (int)Math.sqrt(capacity) / 4);
keyTable = new int[capacity + stashCapacity];
for (int i = capacity; i-- > 0;)
keyTable[i] = EMPTY;
valueTable = (V[])new Object[capacity + stashCapacity];
}
public static void main (String[] args) {
Random r = new Random();
for (int i = 0; i < 1500000; i++) {
int c = 10;
Cuckoo3Stash<Object> m = new Cuckoo3Stash(c, 0.9f);
for (int ii = 0; ii < c; ii++)
m.put(ii, "moo" + ii);
Entries<Object> entries = m.entries();
for (Entry<Object> e : entries) {
System.out.println(e);
if (e.value.equals("moo7")) entries.remove();
}
System.out.println();
for (Entry<Object> e : m.entries()) {
System.out.println(e);
}
System.out.println();
m.remove(2);
for (Entry<Object> e : m.entries()) {
System.out.println(e);
}
m.clear();
break;
}
}
public V put (int key, V value) {
int[] keyTable = this.keyTable;
int mask = this.mask;
// Check for existing keys.
int index1 = key & mask;
int existingKey1 = keyTable[index1];
if (existingKey1 == key) {
V oldValue = valueTable[index1];
valueTable[index1] = value;
return oldValue;
}
int index2 = hash2(key) & mask;
int existingKey2 = keyTable[index2];
if (existingKey2 == key) {
V oldValue = valueTable[index1];
valueTable[index2] = value;
return oldValue;
}
int index3 = hash3(key) & mask;
int existingKey3 = keyTable[index3];
if (existingKey3 == key) {
V oldValue = valueTable[index1];
valueTable[index3] = value;
return oldValue;
}
// Check for empty buckets.
if (existingKey1 == EMPTY) {
keyTable[index1] = key;
valueTable[index1] = value;
if (size++ >= threshold) resize(capacity << 1);
return null;
}
if (existingKey2 == EMPTY) {
keyTable[index2] = key;
valueTable[index2] = value;
if (size++ >= threshold) resize(capacity << 1);
return null;
}
if (existingKey3 == EMPTY) {
keyTable[index3] = key;
valueTable[index3] = value;
if (size++ >= threshold) resize(capacity << 1);
return null;
}
push(key, value, index1, existingKey1, index2, existingKey2, index3, existingKey3);
return null;
}
// Skips checks for existing keys.
private void putResize (int key, V value) {
int[] keyTable = this.keyTable;
int mask = this.mask;
// Check for empty buckets.
int index1 = key & mask;
int existingKey1 = keyTable[index1];
if (existingKey1 == EMPTY) {
keyTable[index1] = key;
valueTable[index1] = value;
if (size++ >= threshold) resize(capacity << 1);
return;
}
int index2 = hash2(key) & mask;
int existingKey2 = keyTable[index2];
if (existingKey2 == EMPTY) {
keyTable[index2] = key;
valueTable[index2] = value;
if (size++ >= threshold) resize(capacity << 1);
return;
}
int index3 = hash3(key) & mask;
int existingKey3 = keyTable[index3];
if (existingKey3 == EMPTY) {
keyTable[index3] = key;
valueTable[index3] = value;
if (size++ >= threshold) resize(capacity << 1);
return;
}
push(key, value, index1, existingKey1, index2, existingKey2, index3, existingKey3);
}
private void push (int key, V value, int index1, int existingKey1, int index2, int existingKey2, int index3, int existingKey3) {
int[] keyTable = this.keyTable;
V[] valueTable = this.valueTable;
int mask = this.mask;
// Push keys until an empty bucket is found.
int insertKey = key, replacedKey;
V insertValue = value, replacedValue;
for (int i = 0, n = pushIterations; i < n; i++) {
// Replace the key and value for one of the hashes.
switch (random(3)) {
case 0:
replacedKey = existingKey1;
replacedValue = valueTable[index1];
keyTable[index1] = insertKey;
valueTable[index1] = insertValue;
break;
case 1:
replacedKey = existingKey2;
replacedValue = valueTable[index2];
keyTable[index2] = insertKey;
valueTable[index2] = insertValue;
break;
default:
replacedKey = existingKey3;
replacedValue = valueTable[index3];
keyTable[index3] = insertKey;
valueTable[index3] = insertValue;
break;
}
// If the replacedKey has an empty bucket, put it there and stop.
index1 = replacedKey & mask;
existingKey1 = keyTable[index1];
if (existingKey1 == EMPTY) {
keyTable[index1] = replacedKey;
valueTable[index1] = replacedValue;
if (size++ >= threshold) resize(capacity << 1);
return;
}
index2 = hash2(replacedKey) & mask;
existingKey2 = keyTable[index2];
if (existingKey2 == EMPTY) {
keyTable[index2] = replacedKey;
valueTable[index2] = replacedValue;
if (size++ >= threshold) resize(capacity << 1);
return;
}
index3 = hash3(replacedKey) & mask;
existingKey3 = keyTable[index3];
if (existingKey3 == EMPTY) {
keyTable[index3] = replacedKey;
valueTable[index3] = replacedValue;
if (size++ >= threshold) resize(capacity << 1);
return;
}
insertKey = replacedKey;
insertValue = replacedValue;
}
stash(insertKey, insertValue);
}
private void stash (int key, V value) {
if (stashSize == stashCapacity) {
// Too many pushes occurred and the stash is full, increase the table size.
resize(capacity << 1);
put(key, value);
return;
}
// Store problem key in the stash.
int index = capacity + stashSize;
keyTable[index] = key;
valueTable[index] = value;
stashSize++;
}
public V get (int key) {
int index = key & mask;
if (keyTable[index] == key) return valueTable[index];
index = hash2(key) & mask;
if (keyTable[index] == key) return valueTable[index];
index = hash3(key) & mask;
if (keyTable[index] == key) return valueTable[index];
index = capacity;
for (int n = index + stashSize; index < n; index++)
if (keyTable[index] == key) return valueTable[index];
return null;
}
public V remove (int key) {
int[] keyTable = this.keyTable;
V[] valueTable = this.valueTable;
int index = key & mask;
if (keyTable[index] == key) {
keyTable[index] = EMPTY;
V oldValue = valueTable[index];
valueTable[index] = null;
size--;
return oldValue;
}
index = hash2(key) & mask;
if (keyTable[index] == key) {
keyTable[index] = EMPTY;
V oldValue = valueTable[index];
valueTable[index] = null;
size--;
return oldValue;
}
index = hash3(key) & mask;
if (keyTable[index] == key) {
keyTable[index] = EMPTY;
V oldValue = valueTable[index];
valueTable[index] = null;
size--;
return oldValue;
}
index = capacity;
for (int n = index + stashSize; index < n; index++) {
if (keyTable[index] == key) {
V oldValue = valueTable[index];
stashRemove(index);
size--;
return oldValue;
}
}
return null;
}
void stashRemove (int index) {
// If the removed location was not last, move the last tuple to the removed location.
int lastIndex = capacity + stashSize - 1;
if (index < lastIndex) {
keyTable[index] = keyTable[lastIndex];
valueTable[index] = valueTable[lastIndex];
valueTable[lastIndex] = null;
} else
valueTable[index] = null;
stashSize--;
}
public void clear () {
int[] keyTable = this.keyTable;
V[] valueTable = this.valueTable;
for (int i = capacity; i-- > 0;) {
keyTable[i] = EMPTY;
valueTable[i] = null;
}
size = 0;
stashSize = 0;
}
/**
* Returns true if the specified value is in the map. Note this traverses the entire map and compares every value, which may be
* an expensive operation.
*/
public boolean containsValue (Object value, boolean identity) {
V[] valueTable = this.valueTable;
if (value == null) {
int[] keyTable = this.keyTable;
for (int i = capacity + stashSize; i-- > 0;)
if (keyTable[i] != EMPTY && valueTable[i] == null) return true;
} else if (identity) {
for (int i = capacity + stashSize; i-- > 0;)
if (valueTable[i] == value) return true;
} else {
for (int i = capacity + stashSize; i-- > 0;)
if (value.equals(valueTable[i])) return true;
}
return false;
}
public boolean containsKey (int key) {
int index = key & mask;
if (keyTable[index] == key) return true;
index = hash2(key) & mask;
if (keyTable[index] == key) return true;
index = hash3(key) & mask;
if (keyTable[index] == key) return true;
index = capacity;
for (int n = index + stashSize; index < n; index++)
if (keyTable[index] == key) return true;
return false;
}
/**
* Increases the size of the backing array to acommodate the specified number of additional items. Useful before adding many
* items to avoid multiple backing array resizes.
*/
public void ensureCapacity (int additionalCapacity) {
int sizeNeeded = size + additionalCapacity;
if (sizeNeeded >= threshold) resize(nextPowerOfTwo((int)(sizeNeeded * (1 / loadFactor))));
}
private void resize (int newSize) {
capacity = newSize;
threshold = (int)(newSize * loadFactor);
mask = newSize - 1;
hashShift = 31 - Integer.numberOfTrailingZeros(newSize);
stashCapacity = Math.max(3, (int)Math.ceil(Math.log(newSize)));
pushIterations = Math.max(16, (int)Math.sqrt(newSize) / 4);
int[] oldKeyTable = keyTable;
V[] oldValueTable = valueTable;
keyTable = new int[newSize + stashCapacity];
int[] keyTable = this.keyTable;
for (int i = newSize; i-- > 0;)
keyTable[i] = EMPTY;
valueTable = (V[])new Object[newSize + stashCapacity];
size = 0;
stashSize = 0;
for (int i = 0, n = oldKeyTable.length; i < n; i++) {
int key = oldKeyTable[i];
if (key != EMPTY) putResize(key, oldValueTable[i]);
}
}
private int hash2 (int h) {
h *= PRIME2;
return h ^ (h >>> hashShift);
}
private int hash3 (int h) {
h *= PRIME3;
return h ^ (h >>> hashShift);
}
public String toString () {
if (size == 0) return "[]";
StringBuilder buffer = new StringBuilder(32);
buffer.append('[');
int[] keyTable = this.keyTable;
int i = keyTable.length;
while (i-- > 0) {
int key = keyTable[i];
if (key == EMPTY) continue;
buffer.append(key);
buffer.append('=');
buffer.append(valueTable[i]);
break;
}
while (i-- > 0) {
int key = keyTable[i];
if (key == EMPTY) continue;
buffer.append(", ");
buffer.append(key);
buffer.append('=');
buffer.append(valueTable[i]);
}
buffer.append(']');
return buffer.toString();
}
/**
* Returns an iterator for the entries in the map. Remove is supported. Note that the same iterator instance is reused each
* time this method is called.
*/
public Entries<V> entries () {
if (entries == null)
entries = new Entries(this);
else
entries.reset();
return entries;
}
/**
* Returns an iterator for the values in the map. Remove is supported. Note that the same iterator instance is reused each time
* this method is called.
*/
public Values<V> values () {
if (values == null)
values = new Values(this);
else
values.reset();
return values;
}
/**
* Returns an iterator for the keys in the map. Remove is supported. Note that the same iterator instance is reused each time
* this method is called.
*/
public Keys keys () {
if (keys == null)
keys = new Keys(this);
else
keys.reset();
return keys;
}
static public class Entry<V> {
public int key;
public V value;
public String toString () {
return key + "=" + value;
}
}
static public class Entries<V> implements Iterable<Entry<V>>, Iterator<Entry<V>> {
Entry<V> entry = new Entry();
int index = -1;
private final Cuckoo3Stash map;
public Entries (Cuckoo3Stash map) {
this.map = map;
}
public boolean hasNext () {
int[] keyTable = map.keyTable;
for (int n = map.capacity + map.stashSize; ++index < n;)
if (keyTable[index] != EMPTY) return true;
return false;
}
public Entry<V> next () {
entry.key = map.keyTable[index];
entry.value = (V)map.valueTable[index];
return entry;
}
public void remove () {
if (index >= map.capacity)
map.stashRemove(index);
else {
map.keyTable[index] = EMPTY;
map.valueTable[index] = null;
}
map.size--;
index--;
}
public void reset () {
index = -1;
}
public Iterator<Entry<V>> iterator () {
return this;
}
}
static public class Values<V> implements Iterable<V>, Iterator<V> {
int index = -1;
private final Cuckoo3Stash map;
public Values (Cuckoo3Stash map) {
this.map = map;
}
public boolean hasNext () {
int[] keyTable = map.keyTable;
for (int n = map.capacity + map.stashSize; ++index < n;)
if (keyTable[index] != EMPTY) return true;
return false;
}
public V next () {
return (V)map.valueTable[index];
}
public void remove () {
if (index >= map.capacity)
map.stashRemove(index);
else {
map.keyTable[index] = EMPTY;
map.valueTable[index] = null;
}
map.size--;
index--;
}
public void reset () {
index = -1;
}
public Iterator<V> iterator () {
return this;
}
}
static public class Keys<V> {
int index = -1;
private final Cuckoo3Stash map;
public Keys (Cuckoo3Stash map) {
this.map = map;
}
public boolean hasNext () {
int[] keyTable = map.keyTable;
for (int n = map.capacity + map.stashSize; ++index < n;)
if (keyTable[index] != EMPTY) return true;
return false;
}
public int next () {
return map.keyTable[index];
}
public void remove () {
if (index >= map.capacity)
map.stashRemove(index);
else {
map.keyTable[index] = EMPTY;
map.valueTable[index] = null;
}
map.size--;
index--;
}
public void reset () {
index = -1;
}
}
static private int randomSeed = (int)System.currentTimeMillis();
/**
* Returns a random integer between 0 (inclusive) and range (exclusive).
*/
static private final int random (int range) {
int seed = randomSeed * 1103515245 + 12345;
randomSeed = seed;
return (seed >>> 15) * range >>> 17;
}
static private int nextPowerOfTwo (int value) {
if (value == 0) return 1;
if ((value & value - 1) == 0) return value;
value |= value >> 1;
value |= value >> 2;
value |= value >> 4;
value |= value >> 8;
value |= value >> 16;
return value + 1;
}
}
[quote]At any rate that is now a treeMap, not a hash map. All my tests are based on 1Million entries or more.
[/quote]
For what definition of “treemap”? Eg, Java’s TreeMap is a red-black tree, which my Cuckoo3Stash certainly isn’t. AFAIK, it is still a cuckoo hash map. There are lots of papers about using a stash with cuckoo.
Probably. Note sure why you mention this though?
For your iterators, I don’t think repeated calling of next() will in fact increment the returned item. ie you would always return the same item. You are not required to call hasNext() in order to get the next item.
Recovering a lost comment from the other thread:
What this says basically is that hashes of the form h = (m * x + c) % M aren’t good for cuckoo hashing. It doesn’t give any suggestions for what are good hashes.
Just read a little bit of that paper. It should be noted that the extra xor makes the hash function i use non linear, ie doesn’t suffer from their warning. So it is at least better than a pure linear ax+c mod d. What makes it nonlinear is the combination of * and ^. Just one or the other is equivalent to a Galois Field (ie GF(2^n)).
You could add extra non linear structure. But from my tests I highly doubt that it will make any real difference. The weak point is that Java doesn’t use good hash values for its hashCode() method. Shame really.
For my objects I often use longHashCode(). This works even when you have more than 1e9 entries.