Any way to remove the costly remove calls here?

I’m trying to do a most recently used map, but i’d like the remove calls to be gone - O(N) - but the sets have no linear view, or their linear view is based on comparators, and i can’t think of a comparator that implements access ordering - and especially not a efficient one.

I was thinking about skip-lists too…

package util;

import java.util.AbstractCollection;
import java.util.AbstractSet;
import java.util.Collection;
import java.util.HashMap;
import java.util.Iterator;
import java.util.LinkedList;
import java.util.ListIterator;
import java.util.Map;
import java.util.Map.Entry;
import java.util.Set;

/**
 * A simple (ie broken for multipurpose) MRU map.
 * The iteration order is the opposite of the 
 * JDK LinkedHashMap LRU configuration.
 * (from most to least recently accessed).
 * 
 * In this class putting or getting a value
 * will modify iteration order, but setting
 * or getting a value from the iterators, will
 * NOT change the next or current order.
 * 
 * The iterators of this class are fail fast. 
 * So no structural modification
 * outside of the iterators when using them,
 * or unsynchonized access to them if working 
 * with threads. Ok?
 */
public class MRUMap<K, V> extends HashMap<K, V> {

    private final LinkedList<K> set = new LinkedList<K>();
    private final int MAX_ENTRIES;
    private static final long serialVersionUID = 3221271453622736157L;
    public MRUMap() {
        super();
        MAX_ENTRIES = 100;

    }

    public MRUMap(final int maxEntries) {
        super();
        MAX_ENTRIES = maxEntries;
    }

    @Override
    public V put(final K key, final V value) {
        //to create access order instead of insertion order
        if (super.containsKey(key)) {
            set.remove(key);
        }
        set.addFirst(key);

        if (removeEldestEntries()) {
            super.remove(set.removeLast());
        }
        return super.put(key, value);
    }

    @Override
    @SuppressWarnings(value = "unchecked")
    public V get(final Object key) {
        if (super.containsKey(key)) {
            set.remove(key);
            set.addFirst((K) key);
            return super.get(key);
        }
        return null;
    }

    @Override
    public V remove(final Object key) {
        if (super.containsKey(key)) {
            set.remove(key);
            return super.remove(key);
        }
        return null;
    }

    @Override
    public void clear() {
        set.clear();
        super.clear();
    }

    /**
     * Used to tell if the eldest entry 
     * should be removed when adding to
     * the map.
     */
    protected boolean removeEldestEntries() {
        return size() > MAX_ENTRIES;
    }
    
    /**
     * Used to tell the number of entries
     * to remove on the putAll method.
     * This method shoud be consistent with
     * removeEldestEntries, ie when it returns true,
     * this method should return a value > 0
     */
    protected int eldestEntries() {
        return size() - MAX_ENTRIES;
    }    

    @Override
    public void putAll(Map<? extends K, ? extends V> t) {
        if (set.isEmpty()) {
            for (Map.Entry<? extends K, ? extends V> e : t.entrySet()) {
                set.addFirst(e.getKey());
                super.put(e.getKey(), e.getValue());
            }
        } else {
            for (Map.Entry<? extends K, ? extends V> e : t.entrySet()) {
                if (super.containsKey(e)) {
                    set.remove(e);
                }
                set.addFirst(e.getKey());
                super.put(e.getKey(), e.getValue());
            }
        }
        if(removeEldestEntries()){
            ListIterator<K> i = set.listIterator(set.size() - eldestEntries());
            while(i.hasNext()){
                super.remove(i.next());
                i.remove();
            }
        }
    }

    @Override
    public Set<K> keySet() {
        return new AbstractSet<K>() {

            @SuppressWarnings("unchecked")
            public Iterator<K> iterator() {
                return new KeyIt();
            }

            public int size() {
                return set.size();
            }
        };
    }

    @Override
    public Collection<V> values() {
        return new AbstractCollection<V>() {

            @SuppressWarnings("unchecked")
            public Iterator<V> iterator() {
                return new ValueIt();
            }

            public int size() {
                return set.size();
            }
        };
    }

    @Override
    public Set<Map.Entry<K, V>> entrySet() {

        return new AbstractSet<Entry<K, V>>() {

            @SuppressWarnings("unchecked")
            public Iterator<Entry<K, V>> iterator() {
                return new EntryIt();
            }

            public int size() {
                return set.size();
            }
        };

    }

    private final class EntryIt extends LRUIterator {

        public Object next() {
            return nextEntry();
        }
    }

    private final class KeyIt extends LRUIterator {

        public Object next() {
            return nextKey();
        }
    }

    private final class ValueIt extends LRUIterator {

        public Object next() {
            return nextValue();
        }
    }

    private abstract class LRUIterator implements Iterator {

        private Iterator<K> it = set.iterator();
        public K current;

        public boolean hasNext() {
            return it.hasNext();
        }

        Map.Entry<K, V> nextEntry() {
            current = it.next();
            return new EntryImpl();
        }

        V nextValue() {
            current = it.next();
            return MRUMap.super.get(current);
        }

        K nextKey() {
            return it.next();
        }

        public void remove() {
            it.remove();
            MRUMap.super.remove(current);
        }

        private final class EntryImpl implements Entry<K, V> {

            private EntryImpl() {
            }

            public K getKey() {
                return current;
            }

            public V getValue() {
                return MRUMap.super.get(current);
            }

            public V setValue(V value) {
                return MRUMap.super.put(current, value);
            }
        }
    }
}

The main problem I see is that you’re storing the data twice, once in a HashMap and once in a Set.

Does the Set only exist so that you know which key-value pairs are the oldest? Is it only so you know which ones to remove?

If so, there may be a solution. (If you need more functionality than that, this might not work.)

Either for the key or the value (let’s use the key here), define a new inner class and a static variable in the outer class:

private class MRUMappableKey<K> {
   private MRUMappableKey(final K key) {
      this.key = key;
      this.index = nextIndex;
      nextIndex++;
   }

   public int hashCode() {return key.hashCode();}

   private final K key;
   private final long index;
}

private static long nextIndex = Long.MIN_VALUE;

It’s basically a wrapper around the key that stores an index representing the order the keys were used in, as well as the key. The HashMap will then be HashMap<MRUMappableKey, V> (though there may be a syntax problem there). You will probably have to make the HashMap a variable within the class rather than making the class extend HashMap, which I think would be a good idea anyways.

Then you only need to call remove on one collection, not two. And HashMap removes are only O(1), where Set removes are O(n).

And when you need to remove the oldest X entries, you need to iterate over the key set. As you iterate, you can compare the times and store the oldest X entries in an array sorted by time used. Then you need to remove each of those entries, which should be O(1) for each entry (O(X) for all of them together) because it’s a HashMap. So it will be a total of O(N + X) to remove the X oldest entries.

Edit: Originally, I used the time, and then I changed it to just a plain number. This is to avoid confusion when two key-value pairs are used at approximately the same time. This is an especially big deal when the granularity of the system timer is high.

Ah, sorting. It makes me unhappy, since that is a very common use case (iterating over the entries). It’s the main reason why i’m looking at another collection. If i can do a "partial sort, maybe it wouldn’t be so bad.
Edit : but i can’t. Foolish me.

[quote]Then you only need to call remove on one collection, not two. And HashMap removes are only O(1), where Set removes are O(n).
[/quote]
That’s rather funny, as HashSet uses HashMap under the hood.

The “set” is actually a LinkedList. He’s not using it as a real set, so perhaps the name is poorly chosen.

I’m not sure if this problem has a solution as it’s set up. The issue here is that you’re trying to use a group of elements in two different orders: one ordered by the order the elements were used in (the LinkedList) and one with a sort of random access (the HashMap).

Perhaps, you should use just the LinkedList (or a similar data structure). Then you can have a HashMap somewhere else. It doesn’t seem like random access getting/putting should be part of the class at all.

When you call the add/remove methods, you’re already using O(N) algorithms anyways, so the HashMap isn’t helping you at all. You might as well make the class just be a LinkedList and drop the HashMap altogether.

I know you don’t want it to be O(n), but including the HashMap just makes it worse. You’re storing and manipulating data that’s completely barely used. The HashMap only avoids checking the LinkedList when the data isn’t in the list. If you’re only using the MRUMap for iteration (which is what it should be for), then you don’t need the HashMap.

Why are there keys and values? It seems like you should just have a list of values with no keys.

There are keys and values because, the data i want to store deals itself to that.

Specifically (simplifying) it is a ULR -> Float mapping, where i get() values by its url , more some restrictions encapsulated in another class (for ex, null key -> 0F).

If i could manipulate the list structure directly, i could get() the element (node) from the map, and just join previous.next = next.previous, and return the node element.

You could implement your own two-way linked list:


class Node<T> {
  Node<T> prev;
  Node<T> next;
  T value;
}

Then inside your MRUMap<K,V> you would have fields:


  Node<K> mruList;
  Map<K,Node<K>> positionInMruList;

mruList contains the keys in the most-recently-used order and positionInMruList is a map to the nodes of mruList. When an existing key is used, you get its node through positionInMruList and move that node to the beginning of mruList. This way all operations of the MRUMap will be O(1) when the operation of the underlying Map implementation is O(1).

P.S. Extending HashMap creates the problem of a fragile base class. It would be better to implement Map and use delegation.


http://www.javaworld.com/javaworld/jw-08-2003/jw-0801-toolbox.html

Yep. That’s what i said.
:wink:

Really the only thing that bothers me about this solution is the need to create my own double linked list, iterators etc.
Just a limitation of information hiding I guess (of java LinkedList).

And i think i would loose the marvelous concurrent modification exception ::).

I’m not sure that is an advantage or a disadvantage. Probably a disadvantage, a iterator can become royally f**ked up if someone removes a value and is using an iterator at the same time…

Can’t it?
If some removed a the current iterator value (by map.remove) then tried to use the iterator, things would get f**ked up, but at the same time, if i adopt the same strategy that the java Collections use (comparing a list modification counter with a iterator copy), then ridiculous situations arise, (like when i created an iterator, didn’t use it, modified the map, tried to use the iterator, and got a concurrent modification exception). So ideally the modification counter should only be copied when the iterator is used the first time, but that is kind slow isn’t it? Probably best to just follow slavishly the collections strategy here?

BTW to create a guardian node (so that remove operations in nodes never have to check for null) in double linked lists i need to do something like this:

Note, i don’t need a complete list interface, only add(), (remove is not needed because i can get the reference from the Map) and possibly addLast (if i want to reverse the map order, but i don’t think so. It makes little sense for the last used value to get chucked into the end - where it might be removed soon).


//never use head or end in list operations.
head = new Node<T>();
end = new Node<T>();
head.next = end;
end.previous = head;

    private final class Node<T> {
        Node<T> previous = head;
        Node<T> next = end;
        T value;

        boolean hasNext(){
            return next != end;
        }
        
        boolean hasPrevious(){
            return previous != head;
        }
    }

Right? Then if anyone is dumb enough to use a iterator without the hasNext() the next() would only return null (the value of the end node), same for previous() with the head node.

Edit: one reference only won’t work, i need the reference to the last node, to remove it if the map gets too big.
Is there some how to make it only use one reference?

Edit 2: hilariously if i want to make the entrySet and keySet have the LRU order i want i need to save the key AND the value in the map.
GRRRRRRRRRRR.

Anybody got a complete map junit test? Is it on the open jdk?

Ended up with this:


package util;

import java.io.Externalizable;
import java.io.IOException;
import java.io.ObjectInput;
import java.io.ObjectOutput;
import java.util.AbstractCollection;
import java.util.AbstractMap;
import java.util.AbstractSet;
import java.util.Collection;
import java.util.HashMap;
import java.util.Iterator;
import java.util.Map;
import java.util.Map.Entry;
import java.util.Set;

/**
 * A simple MRU map.
 * The iteration order is the opposite of the 
 * JDK LinkedHashMap LRU configuration.
 * (from most to least recently accessed).
 * 
 * In this class putting or getting a value
 * will modify iteration order, but setting
 * or getting a value from the iterators, will
 * NOT change the next or current order.
 * 
 * The iterators of this class are fail fast. 
 * So no structural modification
 * outside of the iterators when using them,
 * or unsynchonized access to them if working 
 * with threads. Ok?
 */
@SuppressWarnings(value = "unchecked")
public final class MRUMap<K, V> extends AbstractMap<K, V> implements Externalizable {

    private Map<K, Node> delegate = new HashMap<K, Node>();
    private Node head = new Node(null, null);
    private int MAX_ENTRIES;
    private static final long serialVersionUID = 3221271453222736153L;

    @Override
    public boolean equals(Object o) {
        if (o == this) {
            return true;
        }
        //dont use get (alters order)
        if (o instanceof MRUMap) {
            MRUMap<K, V> m = (MRUMap<K, V>) o;
            return delegate.equals(m.delegate);
        } else {
            return super.equals(o);
        }
    }

    /**
     * A double linked list node.
     * by default, initialized to behave
     * as a guardian. Never remove head
     */
    private final class Node extends SimpleEntry<K, V> {

        Node previous;
        Node next;

        public Node(K key, V value) {
            super(key, value);
        }

        boolean hasNext() {
            return next != head;
        }

        boolean hasPrevious() {
            return previous != head;
        }
    }

    private void addBefore(Node toAdd, Node node) {
        toAdd.next = node;
        toAdd.previous = node.previous;
        node.previous.next = toAdd;
        node.previous = toAdd;
    }

    private void remove(Node toRemove) {
        toRemove.previous.next = toRemove.next;
        toRemove.next.previous = toRemove.previous;
    }

    public MRUMap() {
        this(Integer.MAX_VALUE);
    }

    public MRUMap(final int maxEntries) {
        super();
        if (maxEntries < 0) {
            throw new IllegalArgumentException("maxEntries must not be negative");
        }
        head.previous = head;
        head.next = head;
        MAX_ENTRIES = maxEntries;
    }

    @Override
    public V put(final K key, final V value) {
        Node node = new Node(key, value);
        addBefore(node, head.next);

        node = delegate.put(key, node);

        if (removeEldestEntries()) {
            delegate.remove(head.previous.getKey());
            remove(head.previous);
        }

        if (node == null) {
            return null;
        } else {
            remove(node);
            return node.getValue();
        }
    }

    @Override
    public V get(final Object key) {
        Node node = delegate.get(key);
        if (node == null) {
            return null;
        } else {
            remove(node);
            addBefore(node, head.next);
            return node.getValue();
        }
    }

    @Override
    public V remove(final Object key) {
        Node node = delegate.remove(key);
        if (node == null) {
            return null;
        } else {
            remove(node);
            return node.getValue();
        }
    }

    @Override
    public void clear() {
        head.next = head;
        head.previous = head;
        delegate.clear();
    }

    /**
     * Used to tell when no more entries are allowed
     */
    private boolean removeEldestEntries() {
        return delegate.size() > MAX_ENTRIES;
    }

    @Override
    public void putAll(Map<? extends K, ? extends V> t) {
        Node node;
        if (delegate.isEmpty()) {
            for (Map.Entry<? extends K, ? extends V> e : t.entrySet()) {
                node = new Node(e.getKey(), e.getValue());
                addBefore(node, head.next);
                delegate.put(node.getKey(), node);
                if (removeEldestEntries()) {
                    delegate.remove(head.previous.getKey());
                    remove(head.previous);
                }
            }
        } else {
            for (Map.Entry<? extends K, ? extends V> e : t.entrySet()) {
                node = new Node(e.getKey(), e.getValue());
                addBefore(node, head.next);
                node = delegate.put(node.getKey(), node);
                if (node != null) {
                    remove(node);
                }
                if (removeEldestEntries()) {
                    delegate.remove(head.previous.getKey());
                    remove(head.previous);
                }
            }
        }
    }

    @Override
    public int size() {
        return delegate.size();
    }

    @Override
    public boolean isEmpty() {
        return delegate.isEmpty();
    }

    @Override
    public boolean containsValue(Object value) {
        boolean contains = false;
        
        Node current = head;
        while(current.hasNext() && !contains){
            current = current.next;
            V mapValue = current.getValue();
            contains = mapValue == null ? value == null : mapValue.equals(value);
        }
        return contains;
    }

    @Override
    public boolean containsKey(Object key) {
        return delegate.containsKey(key);
    }

    @Override
    public Set<K> keySet() {
        return new AbstractSet<K>() {

            public Iterator<K> iterator() {
                return new KeyIt();
            }

            public int size() {
                return delegate.size();
            }
        };
    }

    @Override
    public Collection<V> values() {
        return new AbstractCollection<V>() {

            public Iterator<V> iterator() {
                return new ValueIt();
            }

            public int size() {
                return delegate.size();
            }
        };
    }

    @Override
    public Set<Entry<K, V>> entrySet() {

        return new AbstractSet<Entry<K, V>>() {

            public Iterator<Entry<K, V>> iterator() {
                return new EntryIt();
            }

            public int size() {
                return delegate.size();
            }
        };
    }

    private final class EntryIt extends LRUIterator {

        public Object next() {
            return nextEntry();
        }
    }

    private final class KeyIt extends LRUIterator {

        public Object next() {
            return nextKey();
        }
    }

    private final class ValueIt extends LRUIterator {

        public Object next() {
            return nextValue();
        }
    }

    private abstract class LRUIterator implements Iterator {

        public Node current = head;

        public boolean hasNext() {
            return current.hasNext();
        }

        public void remove() {
            Node old = current;
            current = current.next;
            MRUMap.this.remove(old);
            delegate.remove(old.getKey());
        }

        Map.Entry<K, V> nextEntry() {
            current = current.next;
            return current;
        }

        V nextValue() {
            current = current.next;
            return current.getValue();
        }

        K nextKey() {
            current = current.next;
            return current.getKey();
        }
    }

    public void writeExternal(ObjectOutput out) throws IOException {
        out.writeInt(MAX_ENTRIES);
        out.writeInt(delegate.size());

        for (Node current = head; current.hasNext();) {
            current = current.next;
            out.writeObject(current.getKey());
            out.writeObject(current.getValue());
        }
    }

    public void readExternal(ObjectInput in) throws IOException, ClassNotFoundException {
        MAX_ENTRIES = in.readInt();
        int nodesNumber = in.readInt();
        delegate = new HashMap<K, Node>(nodesNumber);
        Node current,old  = head;
        for (int index = 0; index < nodesNumber; index++) {
            current = new Node((K) in.readObject(), (V) in.readObject());
            addBefore(current, old.next);
            delegate.put(current.getKey(), current);
            old = current;
        }
    }
}


Anything obviously wrong? (The concurrent modification exception is not there yet i know).

Looks OK.

What is the reason for having MAX_ENTRIES? Won’t there be any situation where no limit is needed? I think that a safer default value (in default constructor) for MAX_ENTRIES would be Integer.MAX_VALUE, because it’s not common that collections would silently remove some entries by themselves.

Looked ok, but wasn’t. The java default serialization support is really, really bad for collections. StackOverFlowBad.

You’re right about the default value, and i need to think of a way to reproduce the concurrent modification exception…

Does anyone else hates that the Externalizable interface calls the default constructor? I mean WTF, is externalizable suposted to make deserialization faster or not? Because calling the constructor and then having to replace fields is not faster.

Edited version above.

Java’s collections use an int which is incremented every time that the collection is modified. The iterators use that int then to find out if the collection was modified since the iterator was created. Something similar might work for you.

Externalizable makes serialization faster because it does not use reflection. The default serialization uses reflection to access the object’s fields, which is slower than accessing them in normal Java code. But making classes Externalizable is much more manual work than making them Serializable.

How were you able to get a stack overflow during deserialization? I don’t think that that is possible with default serialization. Maybe it’s possible if you write a buggy writeObject method…

There’s a bug with the nextEntry method, (it doesn’t increase). That will teach me to run the tests after all alterations!
Edit: fixed another bug with serialization.

Modified above.

Edit: And another bug. Reading was back to front!

Edit: Yet another. The netbeans coverage tool found various bugs. N^2 equals and barfed containsValue().

The problem with the serialization stuff is that i was trying something huge (and strange). I assume that serialization tries to follow links, and the circular 3000 elements buffer with key, values (that hold other things) was confusing it). Its much faster to save the map as a linear view.

I also have two other (strange) classes that i use mostly for performance reasons. They are two Readers , a StringBuilderReader (to avoid toString() ) and the array wastage it creates - it’s just a copy of the java.utils StringReader, but for StringBuilders (I’d love that the CharSequence interface had the getChars(int, int, char [], int) method - would really allow copies without garbage) and a LazyStringBuilderReader, that hold that and a StringBuilder to allow a client to inject chars into the StringBuilder lazily, so no HUGE char arrays floating around.