Best algorithm to sort semi-sorted arrays?

I am trying my hand at the hutter prize (http://prize.hutter1.net/)

My current compression algorithm sorts an array of Comparable Objects every main cycle.

-This array of objects can potentically increase by one object per cycle.
-Each object’s comparision value can change per cycle, however its order is mostly kept.

This means that the over all order of the objects are similar between cycles.

Is there an sorting algorithm which can take advantage of the semi-sorted state of the array?

I was using the default Arrays.sort() however it does not seem to make much use of the semi-sorted input array and is too slow for my use.

In order to speed up the sorting i have relaxed the need to be in strict order. I.e there can be some objects out of order.

To this end I have developed a BucketSort which i thought should speed up the sorting however it seems to have the opposite effect!

Is there any reason why this implementation is quite slow? (three times as slow as Array.sort() )


   private static int MAX_BUCKET=65535;
   private static Word[] buckets=new Word[MAX_BUCKET];

   public static void bucketSortReverse (Word[] a, int n)
   {
	   int i;
	   int count=1;
	   int index;
	   Word curr;

	   // fill up the buckets with the input words
	   for (i=0;i<n;i++)
	   {
		   curr=a[i];
		   index=(int) (curr.probablility*MAX_BUCKET);

		   curr.next=buckets[index];
		   buckets[index]=curr;
	   }
	   
	   // iterate over the buckets and construct the sorted words
	   for (i=0;i<MAX_BUCKET;i++)
	   {
		   curr=buckets[i];
		   
		   if (curr!=null)
		   {
			   while (curr!=null)
			   {
				   a[n-count++]=curr;
				   curr=curr.next;
			   }
			   buckets[i]=null;
		   }
	   }
   }

NOTE Currently my input Word array contains a maximum of 6000 words.

I have found some Open Source code which by default performs very similarly to Arrays.sort in terms of speed and have modified it to be non-strict in its ordering.

This modified sorting algothim is approximately 15% faster but is still really not fast enough.

By reducing the strict ordering of the sorting algorithm i have also reduced the level of compression. Ultimately i would like to have a fast sorting algothrim which takes into acount semi-sorted objects with out reducing strictness of the sort.

Any ideas?

Generally, when you use less memory, an algorithm runs faster. Clearly, there are counter cases (i.e. using more memory to store computations for reuse), but, in general, using less resources is faster.

In the code you posted, you create a really big array that you don’t use very much of (65535 buckets for 6000 words). An in-place sort might run faster.

I would also make sure that the words aren’t all being stored in the same bucket. I don’t know what “curr.probability” is about, so I don’t really know what’s going on here.

For bucket sort for something like text, I might do the following. Have one bucket for each character that can potentially appear in the input. Put each word in the bucket of its first character.

If you aren’t sure what characters appear, you can check the first character of each word when inputting them and store them in a HashSet. Then just create buckets for the characters used.

Since your data isn’t all text, that won’t work for you. But in any case, you probably want to have fewer buckets.

Does the class “Word” happen to store 16-bit pieces of data? If so, you’re not really doing a bucket sort (so far as I can tell). You’re storing each Word directly where it goes. If the Word class doesn’t contain other data, you could just use a boolean array or a bitset. If the Word class does contain other data, I would only have 255 buckets and store the Words by the first byte. Then I would sort the contents of each bucket. That would be an actual bucket sort instead of just putting the data where it belongs.

It’s kind of hard to tell without knowing what the Word class is.

qwiksort is so easy and so close to optimal, I never use anything else.

[quote]qwiksort is so easy and so close to optimal, I never use anything else.
[/quote]
Well your missing out, quicksort is rubbish on semi-ordered data. Shellsort is much better.

That is very interesting, here i thought reducing the number of comparisions would out weight the loss due to using the large array.

The Array.sort and the modified Open source version of Array.sort are in-place sorting algorithms which do seem to confirm your first statement about less resources=faster.

The probability is a double value which a predictor has generated for each word which is the probability that this word will be the next word in a sequence.
There are cases where the probability will be words with identical probabilities. In this case a linked list is populated at the bucket.

I have seperated the character-level context from the word-level contex to hopefully achieve better compresstion ratio. I do intend to have at least one predictor for the character-level perform similarly to what you have decribed.

If you aren’t sure what characters appear, you can check the first character of each word when inputting them and store them in a HashSet. Then just create buckets for the characters used.

I have separated the XML portions from the input text and have also sperated the punctuation leaving only words delimited by a space.

The Word class represents a string of characters. True, the bucket sort at the moment is not really a bucket… I thought to remove the entire iterative comparision step by putting the words order directly in the sparsely populated array. In my mind i thought it would be quicker… but i am incorrect.

Thanks for you input.

I have used two different versions of shell sort, however they did not perform as well as i had hoped. perhaps they were not optmised as much as Array.sort.

For almost sorted stuff, Selection Sort isn’t that bad either.
it has imho its best-case O(n) for sorted stuff :slight_smile:

thanks for the suggestion. i will look into seletion sort.

Am i not understanding how the selection sort works? I cannot seem to see how a semi-sorted array of objects would make the sort any more efficient.

here is the implementation of selection sort i have derived from online sources:


  public static void selectionSort(Comparable a[], int size)
   {
     int i, j, min;
     Comparable tmp; 

     for (i = 0; i < size - 1; i++)
     {
       min = i;
       for (j = i+1; j < size; j++)
         if (a[j].compareTo(a[min])<0)
           min = j;

       tmp=a[i];
       a[i]=a[min];
       a[min]=tmp;
     }
   }

I have the same problem but with a different data organization so my solution may not fit your situation.

To sort my objects, I use a red-black tree ; when the sorting key changes, items are removed and re-added to the tree.
This makes the sorting dependent on the number of changes in the tree and not (directly) on the number of items in the tree (it is not fulylm true since the cost of adding/removing is linked to the number of items, and this cost will be fairly higher than in a standard list).

In my situation (state sorting in a game engine) this is very well suited since sorting key almost never change, and node adding/removal is fairly sparse (just a few per frame). On a side note, it happens that optimizing the state sorting algorithm was useless since it was not, and by far, a bottleneck.

Vincent

Thanks for your suggestion, i might have to see whether it is suitable for my application… :slight_smile:

I thought i would try my hand at creating my own sorting algorithm when i had an flash of inspiration:

I do not know if it has been done before, however in my research of programs i have not been able to find anything similar so who knows?

I dub it “Run Sort”


import java.util.Comparator;

public class RunSort implements Comparator 
{
	final static private RunSort defaultComparator=new RunSort();
	
	private RunSort()
	{
		
	}

   public static void sort(Comparable a[])
   {
	   sort(a,0,a.length,defaultComparator);
   }		

   public static void sort(Comparable a[], int start,int end)
   {
	   sort(a,start,end,defaultComparator);
   }

   public static void sort(Comparable a[], int start,int end,Comparator comparator)
   {
	   Comparable[] tempArray=new Comparable[end-start];
	   sort(a,start,end,comparator,tempArray);
   }

   public static void sort(Comparable a[], int start,int end,Comparator comparator,Comparable[] tempArray)
   {
	   // calculate the dimenstion of objects to sort in the given array 
	   int size=end-start;
	   
	   // copy the objects to sort into the working temporary array
	   System.arraycopy(a,start,tempArray,0,size);
	   
	   int prevIndex=0;
	   int i;
	   Comparable tmp=null;
	   Comparable prev;
	   
	   Run root=null;
	   Run currRun;
	   Run prevRun=null;
	   Run selectedRun=null;
	   int runCounts=0;
	   
	   //
	   // construct the runs of already sorted objects
	   //
	   
	   // iterate over the array creating runs of objects already in order...
	   prev=tempArray[0];
	   for (i=1;i<size;i++)
	   {
		   tmp=tempArray[i];
		   if (comparator.compare(tmp,prev)<0)
		   {
			   currRun=new Run();
			   runCounts++;
			   currRun.parentRun=prevRun;
			   if (prevRun!=null) prevRun.nextRun=currRun;
			   else root=currRun;
			   currRun.startIndex=prevIndex;
			   currRun.endIndex=i;

			   prevRun=currRun;
			   prevIndex=i;
		   }
		   prev=tmp;
	   }

	   // construct the final run
	   runCounts++;
	   currRun=new Run();
	   currRun.parentRun=prevRun;
	   if (prevRun!=null) prevRun.nextRun=currRun;
	   else root=currRun;
	   currRun.startIndex=prevIndex;
	   currRun.endIndex=i;

	   //
	   // construct the sorted objects
	   //
	   
	   i=0;
	   
	   // while there are still multiple runs...
	   while (runCounts>1)
	   {
		   // select the "first" object from the first run
		   prev=tempArray[root.startIndex+root.offset];
		   selectedRun=root;
		   
		   // compare this object against the "first" objects from the other existing runs.
		   currRun=root.nextRun;
		   while (currRun!=null)
		   {
			   tmp=tempArray[currRun.startIndex+currRun.offset];
			   
			   // if the current run's "first" object is higher than the current best "first" object then this run now has the
			   // best "first" object and so update the selected run and best object.
			   if (comparator.compare(tmp,prev)<0)
			   {
				   selectedRun=currRun;
				   prev=tmp;
			   }
			   currRun=currRun.nextRun;
		   }
		   
		   // update the offset into the selected run. This offset determines the "first" object of the run.
		   selectedRun.offset++;
		   
		   // if we have reached the end of a run. i.e. a run with only one object...
		   if (selectedRun.startIndex+selectedRun.offset==selectedRun.endIndex)
		   {
			   // update the list of runs by removing the selected run from the list making sure that the root run 
			   // and the tail run are updated as necessary.
			   
			   if (selectedRun==root)
			   {
				   root=selectedRun.nextRun;
			   }
			   else if (selectedRun.nextRun==null)
			   {
				   selectedRun.parentRun.nextRun=null;
			   }
			   else 
			   {
				   selectedRun.parentRun.nextRun=selectedRun.nextRun;
				   selectedRun.nextRun.parentRun=selectedRun.parentRun;
			   }
			   runCounts--;   
		   }

		  // insert the best ( the next highest object) into the input array.
		  a[start+i++]=prev;
	   }
	   
	   // we now only have one run and so no comparisions are needed and just copy the run into the input array as is.
	   System.arraycopy(tempArray,root.startIndex+root.offset,a,start+i,size-i);
   }

	public int compare(Object arg0, Object arg1) {
		return ((Comparable) arg0).compareTo(arg1);
	}

}


public class Run {

	public Run nextRun;
	public Run parentRun;
	public int startIndex;
	public int endIndex;
	public int offset;
}

I am able to achieve 25% increase in speed over Arrays.sort() for my application.

It is tailored for semi-sorted data. The more in-order the data is, the faster the sort.

I will probably make a bench mark to give repeatable and definite results.

If any of you can think of optimistions to this code please do tell! i would love to make the sort even faster.

I do have an even more specialised version which uses a pool of Run objects, this gives about 1% increase.

I have made some modifications to my sorting algorithm which has has increased the speed some what.

The high-level sorting algorithm:

  1. iterate over the input array identifying the runs of in-order objects
  2. while we have more than 1 run goto 3 else goto 11
  3. iterate over the runs comparing the “first” object of each run to identify the run which has the next sorted object making note of the previously selected run.
  4. if the selected run’s “last” object is greater than or equal to the previously selected run’s “first” object then we can assume that the entire selected run can be added to the sorted output array. goto 5 else goto 6
  5. copy all the objects in the selected run into the sorted output array. goto 7
  6. copy only the “first” object in the selected run into the sorted output array.
  7. remove the “first” object from the selected run.
  8. if the selected run has no more objects then goto 9 else 10
  9. remove the selected run from the list of runs
  10. goto 2
  11. there is only one run and so directly copy the run into sorted output array.
  12. return the sorted array.

I have made a little benchmark on typical data sets generated from my actual compressor (well converted to integers) and it is pretty pretty good.

The jar file and data sets can be found here: http://javaunlimited.net/hosted/moogie/benchmark.zip (i hope Woogley is ok with that)

The test was on a P4 3Ghz dual core running windows XP with
java version “1.5.0_02”
Java™ 2 Runtime Environment, Standard Edition (build 1.5.0_02-b09)
Java HotSpot™ Client VM (build 1.5.0_02-b09, mixed mode)

  • The wikicompress.RunSort.sort1() resuses the same temporary array each sort.
  • each sort algorithm was performed 1000 times
  • the time is in milliseconds
  • source code is included in the jar file.
  • the gnu.util.arrays.sort is apparently what is used in the GNU java compiler.

Results:

FILE: data_0.txt
java.util.arrays.sort() 359
gnu.java.util.arrays.sort() 547
wikicompress.RunSort.sort() 188
wikicompress.RunSort.sort1() 141
FILE: data_1.txt
java.util.arrays.sort() 391
gnu.java.util.arrays.sort() 532
wikicompress.RunSort.sort() 203
wikicompress.RunSort.sort1() 219
FILE: data_2.txt
java.util.arrays.sort() 359
gnu.java.util.arrays.sort() 485
wikicompress.RunSort.sort() 172
wikicompress.RunSort.sort1() 156
FILE: data_3.txt
java.util.arrays.sort() 453
gnu.java.util.arrays.sort() 516
wikicompress.RunSort.sort() 297
wikicompress.RunSort.sort1() 266
FILE: data_4.txt
java.util.arrays.sort() 375
gnu.java.util.arrays.sort() 531
wikicompress.RunSort.sort() 219
wikicompress.RunSort.sort1() 187
FILE: data_5.txt
java.util.arrays.sort() 375
gnu.java.util.arrays.sort() 500
wikicompress.RunSort.sort() 188
wikicompress.RunSort.sort1() 156
FILE: data_6.txt
java.util.arrays.sort() 391
gnu.java.util.arrays.sort() 516
wikicompress.RunSort.sort() 203
wikicompress.RunSort.sort1() 156
FILE: data_7.txt
java.util.arrays.sort() 579
gnu.java.util.arrays.sort() 625
wikicompress.RunSort.sort() 187
wikicompress.RunSort.sort1() 157
FILE: data_8.txt
java.util.arrays.sort() 344
gnu.java.util.arrays.sort() 469
wikicompress.RunSort.sort() 203
wikicompress.RunSort.sort1() 141
FILE: data_9.txt
java.util.arrays.sort() 563
gnu.java.util.arrays.sort() 594
wikicompress.RunSort.sort() 172
wikicompress.RunSort.sort1() 156
FILE: data_10.txt
java.util.arrays.sort() 532
gnu.java.util.arrays.sort() 563
wikicompress.RunSort.sort() 172
wikicompress.RunSort.sort1() 140
FILE: data_11.txt
java.util.arrays.sort() 453
gnu.java.util.arrays.sort() 579
wikicompress.RunSort.sort() 297
wikicompress.RunSort.sort1() 250
FILE: data_12.txt
java.util.arrays.sort() 343
gnu.java.util.arrays.sort() 485
wikicompress.RunSort.sort() 172
wikicompress.RunSort.sort1() 141
FILE: data_13.txt
java.util.arrays.sort() 359
gnu.java.util.arrays.sort() 469
wikicompress.RunSort.sort() 156
wikicompress.RunSort.sort1() 125
FILE: data_14.txt
java.util.arrays.sort() 406
gnu.java.util.arrays.sort() 532
wikicompress.RunSort.sort() 187
wikicompress.RunSort.sort1() 141
FILE: data_15.txt
java.util.arrays.sort() 438
gnu.java.util.arrays.sort() 531
wikicompress.RunSort.sort() 313
wikicompress.RunSort.sort1() 281
FILE: data_16.txt
java.util.arrays.sort() 375
gnu.java.util.arrays.sort() 485
wikicompress.RunSort.sort() 187
wikicompress.RunSort.sort1() 141
FILE: data_17.txt
java.util.arrays.sort() 344
gnu.java.util.arrays.sort() 469
wikicompress.RunSort.sort() 172
wikicompress.RunSort.sort1() 156
FILE: data_18.txt
java.util.arrays.sort() 328
gnu.java.util.arrays.sort() 469
wikicompress.RunSort.sort() 188
wikicompress.RunSort.sort1() 140
FILE: data_19.txt
java.util.arrays.sort() 469
gnu.java.util.arrays.sort() 501
wikicompress.RunSort.sort() 265
wikicompress.RunSort.sort1() 203
FILE: data_20.txt
java.util.arrays.sort() 422
gnu.java.util.arrays.sort() 516
wikicompress.RunSort.sort() 235
wikicompress.RunSort.sort1() 203
FILE: data_21.txt
java.util.arrays.sort() 343
gnu.java.util.arrays.sort() 485
wikicompress.RunSort.sort() 172
wikicompress.RunSort.sort1() 125
FILE: data_22.txt
java.util.arrays.sort() 344
gnu.java.util.arrays.sort() 453
wikicompress.RunSort.sort() 204
wikicompress.RunSort.sort1() 156
FILE: data_23.txt
java.util.arrays.sort() 328
gnu.java.util.arrays.sort() 453
wikicompress.RunSort.sort() 157
wikicompress.RunSort.sort1() 140
FILE: data_24.txt
java.util.arrays.sort() 516
gnu.java.util.arrays.sort() 610
wikicompress.RunSort.sort() 187
wikicompress.RunSort.sort1() 141

Going to give it a try :-*


FILE: E:\sort contest\data_0.txt
java.util.arrays.sort() 387
gnu.java.util.arrays.sort() 370
wikicompress.RunSort.sort() 201
wikicompress.RunSort.sort1() 146
riven.offending.indices 117

FILE: E:\sort contest\data_1.txt
java.util.arrays.sort() 370
gnu.java.util.arrays.sort() 366
wikicompress.RunSort.sort() 228
wikicompress.RunSort.sort1() 181
riven.offending.indices 83

FILE: E:\sort contest\data_2.txt
java.util.arrays.sort() 317
gnu.java.util.arrays.sort() 326
wikicompress.RunSort.sort() 152
wikicompress.RunSort.sort1() 119
riven.offending.indices 82

FILE: E:\sort contest\data_3.txt
java.util.arrays.sort() 406
gnu.java.util.arrays.sort() 384
wikicompress.RunSort.sort() 285
wikicompress.RunSort.sort1() 268
riven.offending.indices 82

FILE: E:\sort contest\data_4.txt
java.util.arrays.sort() 321
gnu.java.util.arrays.sort() 351
wikicompress.RunSort.sort() 189
wikicompress.RunSort.sort1() 153
riven.offending.indices 81

FILE: E:\sort contest\data_5.txt
java.util.arrays.sort() 302
gnu.java.util.arrays.sort() 324
wikicompress.RunSort.sort() 151
wikicompress.RunSort.sort1() 115
riven.offending.indices 79

FILE: E:\sort contest\data_6.txt
java.util.arrays.sort() 298
gnu.java.util.arrays.sort() 324
wikicompress.RunSort.sort() 159
wikicompress.RunSort.sort1() 118
riven.offending.indices 79

FILE: E:\sort contest\data_7.txt
java.util.arrays.sort() 501
gnu.java.util.arrays.sort() 429
wikicompress.RunSort.sort() 153
wikicompress.RunSort.sort1() 117
riven.offending.indices 78

FILE: E:\sort contest\data_8.txt
java.util.arrays.sort() 299
gnu.java.util.arrays.sort() 322
wikicompress.RunSort.sort() 151
wikicompress.RunSort.sort1() 115
riven.offending.indices 82

FILE: E:\sort contest\data_9.txt
java.util.arrays.sort() 501
gnu.java.util.arrays.sort() 438
wikicompress.RunSort.sort() 153
wikicompress.RunSort.sort1() 115
riven.offending.indices 83

FILE: E:\sort contest\data_10.txt
java.util.arrays.sort() 498
gnu.java.util.arrays.sort() 429
wikicompress.RunSort.sort() 153
wikicompress.RunSort.sort1() 117
riven.offending.indices 84

FILE: E:\sort contest\data_11.txt
java.util.arrays.sort() 410
gnu.java.util.arrays.sort() 399
wikicompress.RunSort.sort() 293
wikicompress.RunSort.sort1() 264
riven.offending.indices 81

FILE: E:\sort contest\data_12.txt
java.util.arrays.sort() 295
gnu.java.util.arrays.sort() 324
wikicompress.RunSort.sort() 153
wikicompress.RunSort.sort1() 115
riven.offending.indices 81

FILE: E:\sort contest\data_13.txt
java.util.arrays.sort() 312
gnu.java.util.arrays.sort() 331
wikicompress.RunSort.sort() 172
wikicompress.RunSort.sort1() 146
riven.offending.indices 81

FILE: E:\sort contest\data_14.txt
java.util.arrays.sort() 498
gnu.java.util.arrays.sort() 431
wikicompress.RunSort.sort() 156
wikicompress.RunSort.sort1() 116
riven.offending.indices 82

FILE: E:\sort contest\data_15.txt
java.util.arrays.sort() 387
gnu.java.util.arrays.sort() 384
wikicompress.RunSort.sort() 292
wikicompress.RunSort.sort1() 258
riven.offending.indices 84

FILE: E:\sort contest\data_16.txt
java.util.arrays.sort() 294
gnu.java.util.arrays.sort() 332
wikicompress.RunSort.sort() 162
wikicompress.RunSort.sort1() 121
riven.offending.indices 81

FILE: E:\sort contest\data_17.txt
java.util.arrays.sort() 294
gnu.java.util.arrays.sort() 330
wikicompress.RunSort.sort() 151
wikicompress.RunSort.sort1() 115
riven.offending.indices 81

FILE: E:\sort contest\data_18.txt
java.util.arrays.sort() 296
gnu.java.util.arrays.sort() 324
wikicompress.RunSort.sort() 154
wikicompress.RunSort.sort1() 119
riven.offending.indices 80

FILE: E:\sort contest\data_19.txt
java.util.arrays.sort() 372
gnu.java.util.arrays.sort() 386
wikicompress.RunSort.sort() 238
wikicompress.RunSort.sort1() 204
riven.offending.indices 82

FILE: E:\sort contest\data_20.txt
java.util.arrays.sort() 367
gnu.java.util.arrays.sort() 366
wikicompress.RunSort.sort() 238
wikicompress.RunSort.sort1() 205
riven.offending.indices 82

FILE: E:\sort contest\data_21.txt
java.util.arrays.sort() 294
gnu.java.util.arrays.sort() 328
wikicompress.RunSort.sort() 160
wikicompress.RunSort.sort1() 120
riven.offending.indices 81

FILE: E:\sort contest\data_22.txt
java.util.arrays.sort() 308
gnu.java.util.arrays.sort() 342
wikicompress.RunSort.sort() 177
wikicompress.RunSort.sort1() 137
riven.offending.indices 82

FILE: E:\sort contest\data_23.txt
java.util.arrays.sort() 293
gnu.java.util.arrays.sort() 326
wikicompress.RunSort.sort() 157
wikicompress.RunSort.sort1() 118
riven.offending.indices 80

FILE: E:\sort contest\data_24.txt
java.util.arrays.sort() 525
gnu.java.util.arrays.sort() 463
wikicompress.RunSort.sort() 157
wikicompress.RunSort.sort1() 124
riven.offending.indices 113

To remove the array-copy overhead from the timing-results, this is the modified benchmark code:


         long total;

         total = 0;
         for (int i = 0; i < 1000; i++)
         {
            System.arraycopy(array1, 0, array2, 0, array1.length);
            long start = System.nanoTime();
            Arrays.sort(array2, 0, array1.length, Collections.reverseOrder());
            total += System.nanoTime() - start;
         }
         System.out.println("java.util.arrays.sort() " + total / 1000000);

         total = 0;
         for (int i = 0; i < 1000; i++)
         {
            System.arraycopy(array1, 0, array2, 0, array1.length);
            long start = System.nanoTime();
            gnu.java.util.Arrays.sort(array2, 0, array1.length, gnu.java.util.Collections.reverseOrder());
            total += System.nanoTime() - start;
         }
         System.out.println("gnu.java.util.arrays.sort() " + total / 1000000);

         total = 0;
         for (int i = 0; i < 1000; i++)
         {
            System.arraycopy(array1, 0, array2, 0, array1.length);
            long start = System.nanoTime();
            RunSort.sort(array2, 0, array1.length, Collections.reverseOrder());
            total += System.nanoTime() - start;
         }
         System.out.println("wikicompress.RunSort.sort() " + total / 1000000);

         total = 0;
         for (int i = 0; i < 1000; i++)
         {
            System.arraycopy(array1, 0, array2, 0, array1.length);
            long start = System.nanoTime();
            RunSort.sort(array2, 0, array1.length, Collections.reverseOrder(), array3);
            total += System.nanoTime() - start;
         }
         System.out.println("wikicompress.RunSort.sort1() " + total / 1000000);

         total = 0;
         for (int i = 0; i < 1000; i++)
         {
            System.arraycopy(array1, 0, array2, 0, array1.length);
            long start = System.nanoTime();
            SortContest.sort(array2, array2);
            total += System.nanoTime() - start;
         }
         System.out.println("riven.offending.indices " + total / 1000000);

The algorithm simply does this:

  1. find the objects that seem out-of-order
  2. null the array-indices of the out-of-order objects
  3. put those in another array
  4. sort that array (typically 0-2 elements)
  5. refill the array with the original indices
    5.1 for every null encountered, goto next
    5.2 else, if the next sorted out-of-order object is bigger than the next to-be-added-object, add it
    5.3 else, add next-to-be-added-object

it takes advangate of the fact that the array is 99.9% sorted, so the out-of-order-array is tiny

Nice!

Could you provide the source code? (it would save me implementing it myself… damn i am lazy!)

I would love to see the performance in my actual application!

I guess i didnt really mind including the array copying in the benchmark as i assumed that the copy will have constant time for all iterations. Though you know what they say about assumptions :slight_smile:

   public static void sort(Object[] src, Object[] dst)
   {
      if (src.length > aux.length)
      {
         aux = new Wrapper[src.length * 2];
         for (int i = 0; i < aux.length; i++)
            aux[i] = new Wrapper();
      }

      // prevent field-access in loops
      Wrapper[] aux = SortContest.aux;
      Wrapper[] holder = SortContest.holder;
      int[] indices = SortContest.indices;

      int len = src.length;

      // link objects to wrappers, then calc index
      for (int i = 0; i < len; i++)
      {
         aux[i].real = src[i];
         aux[i].calcSortIndex();
      }

      // find offensing indices
      int x = aux[0].getSortIndex();
      int y = aux[1].getSortIndex();

      int count = 0;

      for (int i = 2; i < len; i++)
      {
         int z = aux[i].getSortIndex();

         if ((x > y && y < z) || (x < y && y > z))
         {
            holder[count] = aux[i - 1];
            indices[count] = i - 1;
            count++;
         }

         x = y;
         y = z;
      }

      // sort offending indices
      for (int a = 0; a < count; a++)
      {
         for (int b = a; b < count; b++)
         {
            int ai = holder[a].getSortIndex();
            int bi = holder[b].getSortIndex();

            if (ai < bi)
            {
               Wrapper tmp = holder[b];
               holder[b] = holder[a];
               holder[a] = tmp;

               int tmp___ = indices[b];
               indices[b] = indices[a];
               indices[a] = tmp___;
            }
         }
      }

      // zero out offending indices
      for (int i = 0; i < count; i++)
         aux[indices[i]] = null;

      // fill dst with src + offending indices at proper position
      int step = 0;
      int search = 0;
      int filled = 0;

      while (step != count)
      {
         if (aux[search] == null)
            search++;
         else
            dst[filled++] = (aux[search].getSortIndex() < holder[step].getSortIndex()) ? holder[step++].real : aux[search++].real;
      }

      while (filled < len)
      {
         if (aux[search] == null)
            search++;
         else
            dst[filled++] = aux[search++].real;
      }

      // restore offending indices
      for (int i = 0; i < count; i++)
         aux[indices[i]] = holder[i];
   }
public class Wrapper
{
   public Object real;

   public void calcSortIndex()
   {
      this.sortIndex = ((Integer) real).intValue();
   }

   private int sortIndex = 0;

   public int getSortIndex()
   {
      return sortIndex;
   }
}

It is however stunning to see that with the server VM (6.0) your code is beating the crap out of mine (80ms vs 40ms!)

Cheers!