Improving Map performance by knowing all keys up front?

actual · December 16, 2012, 2:33am

I have run across a number of cases where I have a Map where once I set it up, the keys never change (although the values against them do). I don’t necessarily know what the keys will be until I create the map but once I do they are fixed.

I was wondering, is there a way to take advantage of this fact to design a Map like data structure that retrieves values more efficiently than the standard implementations?

The class signature would roughly follow the Map interface and might look something like:


public class FixedKeyMap<K,V> {
	
	// Takes in a list of keys for the map.
	// These keys cannot be added to or removed from.
	// All values are initially set to null.
	public FixedKeyMap(List<K> keys) {}

	// If the Key is valid, replace its value with this one.
	// Otherwise do nothing or throw an exception.
	public void put(K key,V value) {}

	// If key is valid then get the value associated with it
	// Otherwise return null or throw an exception.
	public V get(K key) { }

	// If key is valid then sets its value to null. Does not remove it.
	// Otherwise, do nothing or throw an exception.
	public void remove(K key) { }

	// Sets all values to null but does not remove keys.
	public void clear() {}

	// Returns a set containing all of the keys.
	public Set<K> keySet() { }
}

I looked at the HashMap source and it looks like you would be able to pull out a lot of code around load factors and capacities and simplify some of the other code because the size would be known at creation time.

Anyone have any thoughts on how such a thing might be designed to take advantage of the fact that the keys are all known up front? Could you get rid of the hashing function altogether?

I do recognize that this is a micro optimisation and may or may not be worth the effort for a given application but the question seemed interesting and so I thought I would ask if others had any ideas.

Best_Username_Ever · December 16, 2012, 2:55am

I have wanted such a class for a while now. I can think of three ways to do this. For small maps, just use an array. For large maps I only can think of two. 1) Construct a minimal perfect hash function on the keyset at runtime. ??? 2) Put them in a sorted array and use a binary search for key look ups.

Nate · December 16, 2012, 4:25am

If you create the map and then only do gets, a cuckoo map would be an option. It essentially spends effort during puts to make gets efficient. libgdx has an implementation and also an identity map and a few for primitive keys/values (all based on cuckoo).

Otherwise you can give each of your keys an ordinal and then use an array for the values. This isn’t general purpose of course.

sproingie · December 16, 2012, 5:29am

http://www.anarres.org/projects/jperf/

If you use any alternative Map implementation, you should profile to make sure it’s saving you anything. And yes, if all your use sites are static you can get rid of the hashing altogether and just use the values themselves (or references to them, basically the same thing in java), but that’s kind of a trivial use case. Best you might otherwise do is use Enums and let the JIT optimize the switch table.