Java Data structures

Wiki_Duke · July 24, 2012, 7:08am

[h2]General data structure types and interfaces[h2]

Collection General bag of stuff. Lets you add, remove and iterate when you don’t really care what the underlying structure is.
List Ordered list of stuff. Use when you want objects kept in order, but you don’t care what the underlying structure is.
Set. A collection that does not allow duplicates. Use when having the same object twice in your collection is an error.
Map. Map of key-value pairs. Use when you want fast lookups from some kind of identifying key to another object.

[h2]Lists & queues[/h2]

ArrayList Growable list, implemented with an array that is resized under the hood.
LinkedList Doubly linked list. Faster insertion and removal than an ArrayList, but slower to iterate over.
PriorityQueue Ordered list where objects are sorted by a priority value.
Unrolled linked list

[h2]Lookup tables[/h2]

HashMap (map)

[h2]Ordered collections[/h2]

TreeSet (node based, binary tree)
PriorityQueue (array based, binary heap)

[h2]Thread-safe collections[/h2]
Most containers can be wrapped to create thread safe versions using the methods in Collections. Eg. Collections.synchronizedList(new ArrayList()) More specialised containers can be found in java.util.concurrent.

ConcurrentHashMap. Map with non-blocking get(), useful for caches which may be populated from a loader thread.
ConcurrentLinkedQueue. Non-blocking queue, useful for multithreaded pipelines where producers and consumers may share a common queue.

[h2]General Advice[/h2]

[li]Always declare using the most general type you need. This isn’t so important for locals, but it makes a huge difference for fields, as this allows you to swap the implementation of a datastructure without breaking code, that may otherwise already have dependencies on the explicitly defined implementation. For example:


// Don't do this!
ArrayList<Foo> foos = new ArrayList<Foo>(); 
HashMap<Foo,Bar> foobar = new HashMap<Foo,Bar>(); 

// Do this instead
List<Foo> foos = new ArrayList<Foo>(); 
Map<Foo,Bar> foobar = new HashMap<Foo,Bar>();

ArrayList vs LinkedList

[h2]References[h2]

Interactive applet. Visualization tool that walks through the steps of: BST, AVL, B-tree, AA Tree, Red Black, Skiplist, Max & Min Heap, Treap, Scapegoat and Splay.

Mads · July 25, 2012, 2:32pm

I think we should define each list. What exactly is an “ordered collection”’? Doesn’t that apply to most of the collections?

Riven · July 25, 2012, 2:37pm

Click on the links in the article for more information :persecutioncomplex:

matheus23 · July 25, 2012, 6:35pm

Just like that?

Mads · July 25, 2012, 6:44pm

I added HashMap because I use it a lot, and it’s an awesome collection. Does anyone know what it’s based on though? I’m not sure there’s an actual array involved, even though that’s what it’s mimicking.

theagentd · July 25, 2012, 6:49pm

I think it’s based on an array of linked lists. I wouldn’t say it’s an awesome collection though. It simply allows you to attach a field to an object without making a field. There’s a better way of doing that: with fields… >_>

Riven · July 25, 2012, 7:10pm

HashMaps are indeed an array of LinkedLists, with an important detail: every time you access a value, it becomes the head of the linked list.

Luckily that’s not what it’s typically used for.

Mads · July 25, 2012, 7:16pm

[quote=“Riven,post:7,topic:39351”]
Then it’ll be slower the more bindings, right? :-\

[quote=“theagentd,post:6,topic:39351”]
Sometimes it’s nice to index a list by something other than integers.

Riven · July 25, 2012, 7:18pm

I advice you to read the links you posted yourself. And no, it wouldn’t become slower, the array of the hashmap grows to keep the number of elements in the linkedlists low. And indexing a list by anything else than an integer is a horrible abuse of a (hash)map.

matheus23 · July 25, 2012, 7:20pm

Hey riven!
What about adding something like a commit message to the wiki thingie.
So you could add some message like “removed strange [/i] in line 2”

Riven · July 25, 2012, 7:23pm

For a nice example of what to use a (hash)map for:

http://pastebin.java-gaming.org/5f59f487d1b

It makes a deep-copy of a (circular!) graph.

theagentd · July 25, 2012, 7:29pm

I just took a two-second glance at it, but why not just have a field called “old” in Node?

Riven · July 25, 2012, 7:31pm

Why would I litter my classes with unused fields, only for some situation that I might want to make a deep copy?

Besides that doesn’t solve anything, because I still have to create all the connections (edges) between the nodes and end up with a properly connected graph.

Would you suggest I’d also add an ‘old’ field to the Edge class? And how would I connect it all again? Iterating all possible edges everything I need to find a match? I can see the required code explosion and severely degraded performance already. This is exactly what mappings are supposed to do for you!

Roquen · July 25, 2012, 7:36pm

Hashing is unordered. Trees are ordered.

Mads · July 25, 2012, 7:41pm

[quote=“Riven,post:9,topic:39351”]
I believe I have. The documentation page for the class doesn’t offer a lot of insight into it’s implementation other than “Hash table implementation of Map”, which is just as good as “array-voodoo” to me. Wikipedia doesn’t seem to have a much more understandable version of this. I know its speed for basic operations is described, but “constant-time performance” and “LinkedList” don’t fit too well with eachother in my head.

[quote=“Riven,post:9,topic:39351”]
I didn’t mean a literal list. I agree that lists work best with integers for positions, but what if my list goes [1, 5, 12, 2, 66, 21, 14078]. You can sort that how you want, but there will always be that big hole of nulls. I would think that pairing up the integers to their respective objects would be easier than having a few thousand nulls in a huge array.

How about this: I like to load all my resources once, and then just refer to them with an Enum. For instance, 10k spaceships don’t all need the same little icon in the instance. They all just have the enum-reference, and then I can statically grab the image if it needs to be drawn. The same goes for animations that’re all the same.

In this case it’s just easier to have an Image-enum representing all available images, than having to look up on a huge list what index in the array each Image is located at.

theagentd · July 25, 2012, 7:44pm

@Riven
Oh! The nodes and the edges may be shared! Yeah, I guess that is a perfect example of using a HashTable! Will use that example in the future (unless you’ve patented it xD)!

Riven · July 25, 2012, 7:52pm

This usage of mapping is way too complex to patent! My ‘nullify references to aid the garbage collector’ patent is pending though, although these days it’s rather counter productive to nullify anything, as all data must be retained for 18 months.

sproingie · July 25, 2012, 8:13pm

The linked lists for HashMap only come into play when there’s a collision. Most of the time, the number of collisions is very low, so you’re never chasing down the chain.

I’m actually kind of amazed that HashMap doesn’t use open addressing. Buckets? Really? That’s like Baby’s First Data Structure.

erikd · July 26, 2012, 8:26am

I think only a LinkedHashMap does this, no?

UprightPath · July 26, 2012, 8:39am

No. The LinkedHashMap allows you to have an ordered Iterator (FIFO) method.

Normally, the iterator for a HashMap will be based on bucket order (I think). Meaning that whatever the internal data-structure holding the values stores them. Heh.