Specialized Data Structure

IronclawsBt · March 3, 2011, 3:36pm

I want to use a data structure with a very specific behavior. The general idea is that it acts as an array, but the objects have a permanent location. For example, if three objects were added, they may be assigned the indexes 0, 1, and 2. If object 1 is then removed, the other two objects can still be found at 0 and 2. Once an index is empty, a new object can be reassigned to that index (this is not required though).

The performance requirements are O(1) amortized insert, O(1) delete, O(1) find (by index), and iteration should be O(n) in the number of elements (with a constant factor at least as good as a linked list, preferably closer to array iteration). Most of these requirements would be taken care of by a hash map with carefully chosen keys, but the iteration cost would still be dependent on the capacity, not the actual number of elements.

I am going to use this to keep track of game entities with unique ID’s. I believe I have come up with a solution, and I am currently working out the details in the implementation (that is where the devil is after all). Before I invest to much more time in this, I wanted to see if anyone else has already solved this problem.

Captain_Awesome · March 3, 2011, 3:53pm

Just use a normal array. When you wish to delete an object, just do array[index] = null;
I don’t think it goes much faster than that

IronclawsBt · March 3, 2011, 4:02pm

There are two problems with that, one easily fixed and the second one is what I am working on. The first problem is that leaving null elements will eventually lead to a lot of wasted space. The fix for that is pretty simple though. When an element is removed add that index to a list of empty elements. When a new element is added grab one of the empty elements first.

The hard part is the iteration problem. If there are a lot of empty elements, you are wasting time iterating over those. I am probably over engineering this, but I love clever algorithms so working on this is fun for me, if not completely necessary.

lhkbob · March 3, 2011, 4:13pm

To solve the space issue, you need a map. This is what it was designed for. If you want better iteration performance than what the map gives you, consider wrapping two data structures. You could manage a map for the random access operations and a list or bag for the iteration.

bleb · March 3, 2011, 4:25pm

Sounds like what you want can be solved by a doubly-linked list whose nodes are stored in an array, plus a stack to keep track of empty array locations.

IronclawsBt · March 3, 2011, 4:29pm

That would probably be a workable solution. To remove items from the bag quickly, you need to know what index they are at. The map would essentially translate the permanent index to the local index in the bag. Find, add, and remove would need to go through one layer of indirection and iteration would be at array speeds. Because iteration is generally more performance critical, it would make sense to sacrifice the small speed hit to the other operations to maximize that.

IronclawsBt · March 3, 2011, 6:12pm

A wrote up a rough implementation of the map + bag design. It has no error checking which is critical to maintaining a consistent state, but it does work for the initial tests I ran. I am tentatively calling it a shelf as elements you place in it stay where they are even if you remove other elements (a better name is welcome). Once I get this a bit more polished I will post it in the shared code section.


import java.util.Arrays;
import java.util.Iterator;

/**
 *
 * @author Eric
 */
public class Shelf<E> implements Iterable<E>
{
    private E[] items;
    private int[] map;
    private int size;
    private Node next;

    /**
     * Create a shelf with the default capacity.
     */
    public Shelf()
    {
        this(10);
    }

    /**
     * Create a shelf with the provided initial capacity.
     * @param capacity initial capacity.
     */
    public Shelf(int capacity)
    {
        if(capacity < 0)
            throw new IllegalArgumentException(
                    "Initial capacity must be positive");
        items = (E[])new Object[capacity];
        map = new int[capacity];
    }

    /**
     * Add an element in the first open slot.
     * @param e item to be added.
     * @return the index where the item can be found.
     */
    public int add(E e)
    {
        int index;
        if(next == null) // no empty slots in map
        {
            index = size;
            ensureCapacity(index + 1);
        }
        else // at least one empty slot in map
        {
            index = next.index;
            next = next.next;
        }
        map[index] = size;
        items[size] = e;
        size++;
        return index;
    }

    /**
     * Removes the item with the given index.
     * @param index
     * @return
     */
    public E remove(int index)
    {
        next = new Node(index, next);
        index = map[index];
        E item = items[index];
        size--;
        items[index] = items[size];
        items[size] = null;
        return item;
    }

    /**
     * Finds the item with the given index.
     * @param index
     * @return
     */
    public E find(int index)
    {
        return items[map[index]];
    }

    /**
     * Returns an iterator over the stored elements.
     * @return
     */
    @Override
    public Iterator<E> iterator()
    {
        return new ShelfIterator(items, size);
    }

    /**
     * Ensures that the shelf has a given capacity.
     * @param capacity
     */
    public void ensureCapacity(int capacity)
    {
        if(items.length < capacity)
        {
            int newCapacity = Math.max(capacity, (items.length * 3) / 2);
            items = Arrays.copyOf(items, newCapacity);
            map = Arrays.copyOf(map, newCapacity);
        }
    }
    
    private class Node
    {
        private int index;
        private Node next;

        private Node(int index, Node next)
        {
            this.index = index;
            this.next = next;
        }
    }

    private class ShelfIterator implements Iterator<E>
    {
        private int i = 0;
        private int size;
        private E[] items;

        private ShelfIterator(E[] items, int size)
        {
            this.items = items;
            this.size = size;
        }

        @Override
        public boolean hasNext()
        {
            return i < size;
        }

        @Override
        public E next()
        {
            return items[i++];
        }

        @Override
        public void remove()
        {
            throw new UnsupportedOperationException(
                    "Items cannot be removed during iteration.");
        }
    }
}

krasse · March 3, 2011, 8:28pm

How about the LinkedHashMap?

It is very good for iteration. It can also keep track of the latest object inserted if you want that as well.

pjt33 · March 3, 2011, 10:31pm

Is there a reason that you need to use indices to identify the objects rather than just their references?

IronclawsBt · March 3, 2011, 10:51pm

At the moment no, but it does have its uses. If you want to save then load your game state, using the references would get a bit tricky.

pjt33 · March 4, 2011, 7:07am

Not if you make your classes serialisable. And if you use the references then you can do what you want with a linked list of weak references, and avoid having to reimplement GC.

tberthel · March 4, 2011, 7:44am

According to your time complexity requirements ryanm gave you the correct answer. In general linked list is the answer to the time complexity needs. While insertions and deletions are very fast with a linked list you will often realize that an array is fine.

adon_y_coya · March 4, 2011, 2:18pm

IronclawsBt:

I want to use a data structure with a very specific behavior. The general idea is that it acts as an array, but the objects have a permanent location. For example, if three objects were added, they may be assigned the indexes 0, 1, and 2. If object 1 is then removed, the other two objects can still be found at 0 and 2. Once an index is empty, a new object can be reassigned to that index (this is not required though).

The performance requirements are O(1) amortized insert, O(1) delete, O(1) find (by index), and iteration should be O(n) in the number of elements (with a constant factor at least as good as a linked list, preferably closer to array iteration). Most of these requirements would be taken care of by a hash map with carefully chosen keys, but the iteration cost would still be dependent on the capacity, not the actual number of elements.

A double linked list, with two forward and two backward links: one for any next/previous element and one for next/previous non-null element if the element is non-null (else it could refer to the null previous/next).
If you need to fast refer to any elements by index, then have also an array pointing to them as well.
The tricky part on those links would be to update them when null becomes non-null and vice versa (although that’s when the array can come handy as well, by sequentially scanning forward and backward from updated location/index).

IronclawsBt · March 4, 2011, 2:46pm

The LinkedHashSet achieves what I needed. I forgot that that was included in the collections framework. The performance may not have the best constants, but unless that ends up being a problem, no reason not to use it.