Best algorithm to sort semi-sorted arrays?

! that is a little unexpected… i would have thought mine would have much more over head.

When I have to deal with algorithms I normally take a look at this page at wikipedia:

The Big 0 notation gives a good overview of the quality of the algorithms and the descriptions are detailed enough to pick the right one.

yep the wiki site is very informative! i had researched there before asking here. all of the sorting algs there are aimed at highly random data. this not the case for my situation unfortunetly…

Riven, while i was unable to get the code you provided to compile, i think i was able to follow your logic :slight_smile: and have implemented a sort based on that code.

However it does not seem to improve upon the Runsort.

Perhaps i have implemented your ideas incorrectly? would you be able to take a look?

here is my code:


   public static void InOrderSort(Comparable a[], int start,int end,Comparator comparator,Comparable temp1[], Comparable temp2[])
   {
	   
	   Comparable prev=temp1[0]=a[start];
	   int LIMIT=Math.max(10,(end-start)/5);
	   
	   Comparable curr;
	   int inOrderArraySize=1;
	   int outOfOrderArraySize=0;
	   int runOneStartIndex=start+1;
	   int runOneSize=0;
	   
	   //////////
	   // Populate the in order and out of order arrays
	   //
	   
	   for (int i=start+1;i<end;i++)
	   {
		   curr=a[i];
		   
		   // if out of order...
		   if (comparator.compare(curr,prev)<0)
		   {
			   // add the previous run of in order elements into the in order array
			   System.arraycopy(a,runOneStartIndex,temp1,inOrderArraySize,runOneSize);
			   
			   // update the related variables
			   inOrderArraySize+=runOneSize;
			   runOneSize=0;
			   runOneStartIndex=i+1;
			   
			   
			   // add the the out of order element into the out of order array
			   temp2[outOfOrderArraySize++]=curr;
			   
			   // check to see if the out of order array is too large. This is indictive of a pathological
			   // condition where the first element is much larger than the rest of the elements and so
			   // use the Runsort.sort instead.
			   if (outOfOrderArraySize>LIMIT) 
			   {
				   RunSort.sort(a,start,end,comparator,temp1);
				   return;
			   }
		   }
		   // else we are in order...
		   else
		   {
			   // update the in order run length
			   runOneSize++;
			   
			   // update the previous in order element
			   prev=curr;
		   }
	   }
	   
	   //////////
	   // Sort the out of order array
	   //

	   // check to see if the input array is already sorted...
	   if (outOfOrderArraySize==0) return;
	   
	   // if the last element was not added to the out of order array then we need to finish populating the
	   // in order array
	   if (runOneSize>0) 
	   {
		   System.arraycopy(a,runOneStartIndex,temp1,inOrderArraySize,runOneSize);
		   inOrderArraySize+=runOneSize;
	   }
	   
	   // we need to sort the outOfOrderArray if there are more than one element in the array. 
	   if (outOfOrderArraySize>1)
	   {
		   Arrays.sort(temp2,0,outOfOrderArraySize,comparator);
	   }
	   
	   
	   //////////
	   // Re-populate the input array with the sorted elements
	   //
	   	   
	   // the current index into the input array to place the next sorted element
	   int index=start;
	   
	   // the current indecies into the in order and sorted out of order arrays
	   int j=0,k=0;
	   
	   curr=temp1[j++];
	   prev=temp2[k++];
	   
	   runOneStartIndex=0;
	   runOneSize=0;
	   
	   // re-populate the input array with sorted elements
	   while (index<end && outOfOrderArraySize>0 && inOrderArraySize>0)
	   {
		   // if "first" element in the in order array is smallest... 
		   if (comparator.compare(curr,prev)<0)
		   {
			   // update the next "first" element in the in order array
			   curr=temp1[j++];
			   
			   // update the coutn of in order elements left
			   inOrderArraySize--;
			   
			   // update the current in order run length
			   runOneSize++;
			   
		   }
		   // else the "first" element in the out of order array is smallest... 
		   else
		   {
			   // copy the previous run of in order elements into the input array
			   System.arraycopy(temp1,runOneStartIndex,a,index,runOneSize);
			   
			   // update the current index into the input array
			   index+=runOneSize;
			   
			   // update the run's variables
			   runOneSize=0;
			   runOneStartIndex=j;
			   
			   // copy the element out of the out of order array into the input array
			   a[index++]=prev;
			   
			   // update the next "first" element in the out of order array
			   prev=temp2[k++];
			   
			   // update the count of out of order elements left
			   outOfOrderArraySize--;
		   }
		  
	   }
	   
	   // if there is a run of in order elements still to be copied into the input array ...
	   if (runOneSize>0)
	   {
		   // copy the previous run of in order elements into the input array
		   System.arraycopy(temp1,runOneStartIndex,a,index,runOneSize);
		   index+=runOneSize;
	   }
	   
	   // we have no left over elements and so return
	   if (index==end) return;
	   
	   // if we have left over sorted out of order elements and we directly copy the elements into the 
	   // input array.
	   if (inOrderArraySize==0)
	   {
		   System.arraycopy(temp2,k-1,a,index,outOfOrderArraySize);
	   }
	   // else we have left over in order elements and so we directly copy the elements into the 
	   // input array.
	   else
	   { 
		   System.arraycopy(temp1,j-1,a,index,inOrderArraySize);
	   }
   }

Actually i do see an improvement when i compile the test into a native windows executable using gcj.

Interestingly enough the time taken for the InOrderSort is very similar between data sets:

C:\TEMP\oldprojects\WikiCompress>testsort1
FILE: data_0.txt
java.util.arrays.sort() 2235
gnu.java.util.arrays.sort() 2218
wikicompress.RunSort.sort() 610
wikicompress.RunSort.sort1() 203
Utilities.InOrderSort() 203
FILE: data_1.txt
java.util.arrays.sort() 2235
gnu.java.util.arrays.sort() 2234
wikicompress.RunSort.sort() 625
wikicompress.RunSort.sort1() 281
Utilities.InOrderSort() 203
FILE: data_2.txt
java.util.arrays.sort() 2203
gnu.java.util.arrays.sort() 2203
wikicompress.RunSort.sort() 610
wikicompress.RunSort.sort1() 203
Utilities.InOrderSort() 187
FILE: data_3.txt
java.util.arrays.sort() 2313
gnu.java.util.arrays.sort() 2296
wikicompress.RunSort.sort() 657
wikicompress.RunSort.sort1() 375
Utilities.InOrderSort() 203
FILE: data_4.txt
java.util.arrays.sort() 2234
gnu.java.util.arrays.sort() 2219
wikicompress.RunSort.sort() 625
wikicompress.RunSort.sort1() 234
Utilities.InOrderSort() 203
FILE: data_5.txt
java.util.arrays.sort() 2297
gnu.java.util.arrays.sort() 2812
wikicompress.RunSort.sort() 782
wikicompress.RunSort.sort1() 296
Utilities.InOrderSort() 204
FILE: data_6.txt
java.util.arrays.sort() 2203
gnu.java.util.arrays.sort() 2328
wikicompress.RunSort.sort() 657
wikicompress.RunSort.sort1() 218
Utilities.InOrderSort() 204
FILE: data_7.txt
java.util.arrays.sort() 2250
gnu.java.util.arrays.sort() 2515
wikicompress.RunSort.sort() 719
wikicompress.RunSort.sort1() 219
Utilities.InOrderSort() 265
FILE: data_8.txt
java.util.arrays.sort() 2672
gnu.java.util.arrays.sort() 2813
wikicompress.RunSort.sort() 875
wikicompress.RunSort.sort1() 234
Utilities.InOrderSort() 250
FILE: data_9.txt
java.util.arrays.sort() 2594
gnu.java.util.arrays.sort() 2640
wikicompress.RunSort.sort() 625
wikicompress.RunSort.sort1() 204
Utilities.InOrderSort() 203
FILE: data_10.txt
java.util.arrays.sort() 2187
gnu.java.util.arrays.sort() 2219
wikicompress.RunSort.sort() 609
wikicompress.RunSort.sort1() 203
Utilities.InOrderSort() 203
FILE: data_11.txt
java.util.arrays.sort() 2375
gnu.java.util.arrays.sort() 2469
wikicompress.RunSort.sort() 656
wikicompress.RunSort.sort1() 391
Utilities.InOrderSort() 203
FILE: data_12.txt
java.util.arrays.sort() 2297
gnu.java.util.arrays.sort() 2218
wikicompress.RunSort.sort() 610
wikicompress.RunSort.sort1() 187
Utilities.InOrderSort() 313
FILE: data_13.txt
java.util.arrays.sort() 2360
gnu.java.util.arrays.sort() 2390
wikicompress.RunSort.sort() 625
wikicompress.RunSort.sort1() 219
Utilities.InOrderSort() 203
FILE: data_14.txt
java.util.arrays.sort() 2172
gnu.java.util.arrays.sort() 2344
wikicompress.RunSort.sort() 656
wikicompress.RunSort.sort1() 203
Utilities.InOrderSort() 203
FILE: data_15.txt
java.util.arrays.sort() 2375
gnu.java.util.arrays.sort() 2360
wikicompress.RunSort.sort() 671
wikicompress.RunSort.sort1() 391
Utilities.InOrderSort() 203
FILE: data_16.txt
java.util.arrays.sort() 2234
gnu.java.util.arrays.sort() 2235
wikicompress.RunSort.sort() 640
wikicompress.RunSort.sort1() 219
Utilities.InOrderSort() 203
FILE: data_17.txt
java.util.arrays.sort() 2250
gnu.java.util.arrays.sort() 2219
wikicompress.RunSort.sort() 609
wikicompress.RunSort.sort1() 203
Utilities.InOrderSort() 203
FILE: data_18.txt
java.util.arrays.sort() 2234
gnu.java.util.arrays.sort() 2219
wikicompress.RunSort.sort() 610
wikicompress.RunSort.sort1() 203
Utilities.InOrderSort() 187
FILE: data_19.txt
java.util.arrays.sort() 2266
gnu.java.util.arrays.sort() 2265
wikicompress.RunSort.sort() 641
wikicompress.RunSort.sort1() 328
Utilities.InOrderSort() 187
FILE: data_20.txt
java.util.arrays.sort() 2250
gnu.java.util.arrays.sort() 2250
wikicompress.RunSort.sort() 641
wikicompress.RunSort.sort1() 328
Utilities.InOrderSort() 187
FILE: data_21.txt
java.util.arrays.sort() 2235
gnu.java.util.arrays.sort() 2250
wikicompress.RunSort.sort() 640
wikicompress.RunSort.sort1() 203
Utilities.InOrderSort() 203
FILE: data_22.txt
java.util.arrays.sort() 2250
gnu.java.util.arrays.sort() 2266
wikicompress.RunSort.sort() 609
wikicompress.RunSort.sort1() 234
Utilities.InOrderSort() 203
FILE: data_23.txt
java.util.arrays.sort() 2219
gnu.java.util.arrays.sort() 2250
wikicompress.RunSort.sort() 609
wikicompress.RunSort.sort1() 204
Utilities.InOrderSort() 203
FILE: data_24.txt
java.util.arrays.sort() 2172
gnu.java.util.arrays.sort() 2156
wikicompress.RunSort.sort() 625
wikicompress.RunSort.sort1() 219
Utilities.InOrderSort() 265

well, it must be more efficient as when i use this InOrderSort in my actual compression program I am able to save about 1 second in total run time when compressing a 256k sample as compared to using RunSort.

Hm, I’ll try to improve it a bit, and provide compilable sourcecode. Sorry…