Choosing the right collection

I’m storing my server events in Frame objects. Each frame is assigned a number and is stored in a hashtable where the number is the key and the Frame is the object. My Scheduler keeps track of current frame and gets it from the hashtable if the key exists. Basically I would only have to look at the first object in the collection if it were sorted. I’m not sure what collection I should be using. Is hashtable a good choice?

Look at the “Java Optimisations” book available at http://java.sun.com

There is a review of Collections, which are benchmarked according to their usage.

A quick reference :

Class Add Iterate Random Remove

ArrayList 0 ms 660 ms 0 ms 2,360 ms
LinkedList 50 ms 1,100 ms 26,800 ms 0 ms
Vector 0 ms 880 ms 0 ms 2,580 ms
TreeSet 330 ms 1,430 ms N/A 60 ms
HashSet 110 ms 1,430 ms N/A 50 ms

A quote from the book :

"These results illustrate how important the selection of algorithms and data structures can be. Look at the results from the different List implementations. ArrayList and Vector, two similar classes that are both array-based structures, show similar performance characteristics. Since Vector provides synchronization by default, it’s slightly slower. If you compare the ArrayList and LinkedList results, however, you’ll see that they have very different performance characteristics. While these two structures are capable of performing the same operations, operations that are practically free on the ArrayList are very costly with the LinkedList and vice-versa. The results also show how the set-based structures have very different performance characteristics from the list-based structures. "

I hope it could help you !

  • Rabbit -

[ Note for french people : i’ve made a translation of this book. I can not yet distribute it, but i’m working on it ]

Beware of these timings. They do not take in account the extra work that can happen sometimes and take some time (ArrayList that needs to alocate a new array, hash tables that need rehashing, …)
Keep in mind that these classes don’t have a predictible timing. There are also some tweaking of creation parameters that can slow down or accelerate depending on the use that is done of the classes. A method said to be fast can be very slow if used in wrong way. (for example setting increment size too low for an ArrayList)

While its true that any direct emasurementsa re VM and implementatiion specidifc, there are some genralities you can make that hold true for the fundemental datastures.

For instance, scanning through an ArrayList shoudl always be faster then scanning through a linked list but insertions (and liekly deletions) will be faster ina linked list.

Trees are a good all around compromise structure and are good for searches, though both arrays and linked lists beat them out in their own best-performance areas.

Okay this got me thinking. Is it faster to Iterate through an ArrayList of objects and examine a variable (key) to find the object I’m looking for, than use a hashtable where the key is stored in the collection.

Probably only if your list is pretty small, or if you’ve got a shzit-load of collisions in your hash… Remember that removal from random places in an arrayList is pretty slow if you want to maintain the ordering…

What is the range of numbers?
Is insertion order related to retrieve order in any way (like first in first out or whatever)?
Keeping a structure sorted will usually be more expensive than inserting in/deleting from a Hastable. Or do you have to test several keys to retrieve the correct key?

Everyone else is giving you pretty good advice. My only addition is this…

If performacne is your issue you almost never want to iterate through an array looking for something unelss it is evry very small. If you keep it sorted 9)which does cost) thre are much faster search algorithms then linear search. A binary chop is, AIR, on the average twice as fast as a linear scan. A proportional chop is even faster.