Good sort algorithms

Preston · October 23, 2003, 8:58am

To keep Markus’ Wurm thread to its topic, I’ll try move the “what algorithm is best to sort lists” sub-thread to this location. For a start I copy my last two articles.
Also I’d like to add that of course it always depends on one’s task what has to be sorted in order to choose the right algorithm. So the topic’s name isn’t very accurate.

[quote]Radix sort is the way to go, it’s an order of magnitude faster than Collections.sort() and ultimately it sorts in 5 log n speed instead of n log n which I believe Collections manages.
Cas
[/quote]
Radix sort … sounds interesting. Thanks for the hint. I’ve read a little bit in an algorithm book… basically it says:

The heap sort (of the C++ STL for example) has a worst time O(n*log(n)), with n being the number of elements in the list. It’s not stable and the pre-sorting of the list doesn’t help.

The Java “Collections.sort()” is a kind of merge sort, which has a worst time O(n*log(n)). It is stable however and it runs substantially faster on nearly sorted lists.

The radix sort has a O(n*s), with s being the size of the key. It is stable and to pre-sorting of the list doesn’t help.

Is that correct so far? Hope so.
Well, it depends on whether you’ve got some kind of pre-sorting in the list, whether you need a stable sort, and how large the list (and the key size) is. So I’ll have to figure out which fits best to my alpha blending sorting.

Some other posters mentioned that memory consumption of a sort algorithm matters, too.

Still, I don’t see why the Collection.sort() algorithm is slower than other good sorting algorithms, for example if you want to sort surfaces or particles which have got alpha transparency.
I always thought a O(n*log(n)) was pretty OK? Isn’t it?

blahblahblahh · October 23, 2003, 9:13am

O(blah) means “if you remove any multiplicative constants”.

So, one O(nlog-n) might take 500nlog(n), and another might take 2nlog(n). Therefore, one is 250 times faster than the other, but they are both O(nlog-n).

The point about merge sort is that the variance of it’s execution time across different lists is relatively low (although IIRC there are some variants that are even lower variance) - you can guarantee it will always execute about as fast for any list of a given size. Other sorts are much faster for some lists, and much slower for others.

Merge sort is nearly always used everywhere as the “generic” sort, because you can guarantee good performance. If you know the patterns your data is typically arranged in (can discover this by simple trial and error) you can pick a much better sort algo for your app. But remember that the API has to be good for all cases.

Preston · October 23, 2003, 9:34am

Interesting article you wrote:

Even on the same data?

For example; optimally you have a class which holds the data (a single 3d-particle in Markus’ case) and it has got the right comparable inferface (or a comperator class if this fits better). Assumed the Collection class would hold several sort algorithms, and now you should be able to use the one which fits best, be doing tests.


List thelist; // holds <n> particle objects
// ...
Collections.quicksort(thelist); // measure time.
Collections.heapsort(thelist); // measure time.

Do I understand you correctly that for the quicksort method call this could mean 500nlog(n), whilst for the heapsort method call it could mean 100nlog(n) ?

[quote]Merge sort is nearly always used everywhere as the “generic” sort, because you can guarantee good performance. If you know the patterns your data is typically arranged in (can discover this by simple trial and error) you can pick a much better sort algo for your app. But remember that the API has to be good for all cases.
[/quote]
I see. That’s the point then.

blahblahblahh · October 23, 2003, 10:14am

[quote]Assumed the Collection class would hold several sort algorithms, and now you should be able to use the one which fits best, be doing tests.


List thelist; // holds <n> particle objects
// ...
Collections.quicksort(thelist); // measure time.
Collections.heapsort(thelist); // measure time.

Do I understand you correctly that for the quicksort method call this could mean 500nlog(n), whilst for the heapsort method call it could mean 100nlog(n) ?
[/quote]
Yes. Google for “big o notation” - there should be thousands of good pages for it. You’ll find explanation of why O() is a useful metric, and why it deliberately does NOT differentiate between two algorithms which are hundreds of times faster/slower than each other like this. It’s a simple concept, so should be easy to grasp.

princec · October 23, 2003, 11:36am

And back to radix sorting: the radix sort is massively useful in 3D rendering as everything you need to sort on is numerical. The radix sorts I use work on both ints and floats, and ideally suited to sorting 10k+ triangles a frame on multiple keys.

Cas

Markus_Persson · October 23, 2003, 2:51pm

Got any code? =) I found some sourceforge site, but there were no releases.

tom · October 23, 2003, 4:22pm

http://cvs.sourceforge.net/viewcvs.py/spgl/spgl/src/com/shavenpuppy/jglib/algorithms/RadixSort.java

I did a quick test, and princec RadixSort is roughly 3 times faster than my Advanced non-recursive quicksort

Markus_Persson · October 23, 2003, 6:24pm

Ah, thanks for the link. I’m really bad at sourceforge/cvs.

tortoise · October 23, 2003, 7:52pm

It’s been proven that if you don’t have any preestablished knowledge on what you are sorting (like Radix sort knows its working with numbers, or you don’t know how partially sorted your stuff already is, etc), then O(n*logn) is as good as it gets.

altair · October 24, 2003, 5:11am

Correct me if I am wrong but accessing or inserting an element in a balanced (red-black) tree is O(log n). So, using a SortedMap is still faster than using a radix sort. The only advantage of the radix sort (or others) I can see is when basic (non object) types are managed (then again it is possible to write a class that would manage such types).

abies · October 24, 2003, 7:48am

Trees can be faster if you only add/remove few elements per iteration. If you want to recreate tree every time, you are back to n*log(n) in best case. Frame coherence does not help here much - even slightest change in viewer position will change all distances for all elements - and even if your tree would be correctly sorted from start, you still need to remove and add every element to be sure (probably faster by just recreating it from scratch).

MGodehardt · November 7, 2003, 8:48am

It depends on what u sort for e.g. if u sort a userlist and u have a maximum length for a username then radixsort is a nice algo.

if u use too many buckets in radix sorts, it maybe slower than any other sorting algo, i learned it at university ( you can go to university and hear some lessons about sorting )

So make some notes about what u sort

1.) has it a fixed length
2.) how many different chars ( numbers have 10 )
–> for radix this would be 10 buckets

3.) google around and look for what to do when
4.) experiment with different algos, measure them

Sometimes u spend 40 or 80 hours on research and experimental programming, but then u will have the correct sorting algo.

Radix will perform bad on presorted lists , means where 90% of the list is already sorted.