Iterate each element of a collection randomly once only

moogie · June 14, 2012, 2:50am

Does anyone know of an algorithm that will visit each element of a collection, lets say an array, once only but in a pseudo random manner?

My array has potentially many many elements (i.e. 2^28) so storing which elements that have been visited nor generating a random index sequence are viable options.

I am hoping that there is an algorithm, given a random seed will and the current loop count will give the next element index with the constraint that index returned is unique and thus will achieve coverage over all elements when the loop count == the number of elements in the array.

i.e.


long seed = new Random().nextLong();
for (i=0;array.length;i++)
{
int index = nextIndex(long,i);
Object element = array.get(index);
// use element
}

UprightPath · June 14, 2012, 3:24am

Well, this method isn’t completely random, but it might work for what you need.

Ensure that your list’s length is a multiple of X (Some value), padding it if you need to.
Randomly select an index point in your array. This is your start index.
Randomly select an iteration direct (boolean up/down) and a move direction (boolean up/down).
Iterate through your list from your start index, jumping X elements in your iteration direction each time, wrapping around start/end when you get there.
When you get your start index, move once in your move direction unless you’ve moved X times, then go back to 3.

So, let’s say you pick ‘3’ as your X. And you have an int array {5, 2, 3, 4, 0, 1, 9, 6, 7, 8, 10, 11}.
Start at index 3. Iterate up, move down. currIndex = 3

array[3] = 4 (Start)
array[6] = 9
array[9] = 8
array[0] = 5
array[2] = 3 (Move one down)
array[5] = 1
array[8] = 7
array[11] = 11
array[1] = 2 (Move one down)
array[4] = 0
array[7] = 6
array[10] = 10 (Over)

If you want to make it more fancy, you can layer this…

I’ll update with a pastebin in a moment. ^^;

moogie · June 14, 2012, 4:06am

Inspired by your solution, i have come up with a solution that seems to work for odd numbered elements…

pick a random positive integer R1
create a step interval I by multiplying the next whole integer after dividing the number of elements by 2 and then multiplying that by R1 and then add 1. i.e. R1 = ((int) (Math.max(((double)array.length)/2))*R1 + 1
pick a random index as the start point R2
set current index to R2. i.e. index = R2
the next index will be: index = (index +1 + I ) % array.length;
repeat step 5 until all elements visited.

now to see whether i can find one for even numbers.

davedes · June 14, 2012, 4:25am

I suppose you can’t shuffle the array and then simply walk through it?

Or you could use a “linear congruential generator” -

http://www1.i2r.a-star.edu.sg/~knandakumar/nrg/Tms/Probability/Probgenerator.htm
http://static.usenix.org/event/usenix99/full_papers/deraadt/deraadt_html/node17.html

Riven · June 14, 2012, 5:00am

If you really don’t want to make a copy of the collection, you can add a layer of indirection.

Create an int[collection.size()] filled with the values: {0,1,2,…,collection.size()-1}. Then shuffle that. Then iterate over your int[] and use the values (indices) to lookup elements in the original collection.

Making a copy of the original collection and shuffling that is a lot easier though.

sproingie · June 14, 2012, 5:03am

Impossible. It would require a RNG with perfect distribution equal to its period, which would mean it would be the opposite of random, as each output would depend on all previous ones.

Quickest solution I can think of off the top of my head is walk down the array once and swap each item with a randomly selected one (this may in fact be the algorithm for shuffle, but you can write it by hand for arrays)

UprightPath · June 14, 2012, 5:30am

Even looking at this method, I don’t believe that it would work. By that, besides the fact that at least one of your method calls listed is incorrect (Math.max(double value)?) Anyway, about 10% of the time when I try this algorithm (If I have it written correctly, hah!) I end up with it repeating the same element each time.


	public SemiRandomIteratorOdd(List<?> collection) {
		this.collection = collection;
		r1 = Math.abs(random.nextInt());
		interval = Math.abs((int) Math.ceil((double)collection.size() / 2.0) * r1 + 1);
		r2 = random.nextInt(collection.size());
		currIndex = r2;
		System.out.println("Index: " + currIndex + " interval: " + interval);
	}
	public Object getNext() {
		Object o = collection.get(currIndex);
		currIndex = (currIndex + 1 + interval) % collection.size();
		totalIterations++;
		return o;
	}

Shallow Copy + Shuffling might be the best way you can do it.

I have some code which might work assuming you can figure out two common factors for your list size (IE, it’s not a prime). It’s here: http://pastebin.java-gaming.org/54a6b9d041c

moogie · June 14, 2012, 7:22am

@sproingie: Yep, I kind of thought that a true random sequence is not possible, but fortunately it is not required, only to be “random-looking”

@UprightPath: i think i have found the issue… the interval needs to be a positive prime number. (i have to check this out first). Will have a look at your code shortly.

@Riven: i would like to avoid shuffling… due to memory constraints (two arrays of 2^28 elements is quite alot ) but may have to just shuffle.

@davedes: will investigate those links and see if i can use them. thanks

Orangy_Tang · June 14, 2012, 7:40am

[s]Knuth Shuffle is in-place, so I’d go with that (don’t know if that’s what Collections.shuffle does or if you’ll have to implement it yourself though).

http://en.wikipedia.org/wiki/Fisher–Yates_shuffle[/s]

Edit: didn’t realise you need the original array unchanged afterwards. In that case, then perhaps find a reversible random number generator so you can ‘unshuffle’ by doing it in reverse.

Riven · June 14, 2012, 8:01am

Can’t you simply chop your collection into interleaved chunks:

0,1,2,3,…,n,0,1,2,3,…,n,0,1,2,3,…,n,0,1,2,3,…,n,0,1,2,3,…,n,etc

Make a temporary list of chunk 0, shuffle it, iterate over them.
Make a temporary list of chunk 1, shuffle it, iterate over them.
Make a temporary list of chunk 2, shuffle it, iterate over them.

It will certainly look random, and returns every element once.

Roquen · June 14, 2012, 8:25am

What do you really what to do here? This sounds likes you have a broken problem statement to me.

To answer you question as posed…use a quasi-random number generator. The simplest would be to rearrange the bits of the index in fixed pattern. This is easiest when the size in insured to be power-of-two…otherwise it gets trickier.

Riven · June 14, 2012, 8:27am

public class Main {
	public static void main(String[] args) {

		List<String> values = new ArrayList<String>();
		for (int i = 0; i < 1 << 20; i++) {
			values.add(String.valueOf(i));
		}

		final boolean[] found = new boolean[values.size()];
		iterateRandomly(values, new Callback<String>() {
			public void callback(String item) {
				found[Integer.parseInt(item)] = true;
			}
		}, 32);

		for (boolean flag : found) {
			if (!flag) {
				throw new IllegalStateException("missed value");
			}
		}

		System.out.println("OK");
	}

	public static <T> void iterateRandomly(List<T> list, Callback<T> callback, int chunkCount) {
		if (list.size() % chunkCount != 0) {
			throw new IllegalArgumentException("nasty precondition");
		}
		
		int chunkSize = list.size() / chunkCount;

		Random r = new Random();

		T[] tmp = (T[]) new Object[chunkSize];
		for (int k = 0; k < chunkCount; k++) {
			// grab selection with offset and interval
			for (int i = 0; i < tmp.length; i++) {
				tmp[i] = list.get(k + i * chunkCount);
			}

			// destructive shuffle
			for (int i = 0; i < tmp.length; i++) {
				int p = i + r.nextInt(tmp.length - i);

				// iterate
				callback.callback(tmp[p]);

				tmp[p] = tmp[i];
			}
		}
	}
}

Riven · June 14, 2012, 8:27am

I was thinking among similar lines: just XOR-ing your POT index. But there will be noticable patterns.

Roquen · June 14, 2012, 8:40am

Yes…if you need need more random than that…you have to shuffle bits around.

Along the same lines of your suggestion…assuming that the ‘in-order’ version has no meaning, you could do it two passes. First walk the data in-order. Second pass shuffles elements within a bucket size. Every other set of passes, you offset the bucket boundaries by 1/2 the bucket size, so elements can eventually move anywhere in the array.

(EDIT: shuffling in buckets is to take advantage of caching BTW)

EDIT 2: For large arrays, buckets makes much more sense…collapse the two passes into one, working inside a bucket, and for slightly added randomness the starting point could be chosen by a pseudo-RNG. But this is all speculation since we don’t know the real problem statement.

Riven · June 14, 2012, 9:02am

No more “nasty precondition”:


public class Main {
	public interface Callback<T> {
		public void callback(T item);
	}

	public static void main(String[] args) {

		List<String> values = new ArrayList<String>();
		for (int i = 0; i < 1 << 20; i++) {
			values.add(String.valueOf(i));
		}

		final int[] histogram = new int[values.size()];

		// test performance for various bucket counts
		for (int bucketCount = 333; bucketCount >= 3; bucketCount--) {
			System.out.println("bucketCount=" + bucketCount);

			// iterate
			iterateRandomly(values, new Callback<String>() {
				public void callback(String item) {
					histogram[Integer.parseInt(item)]++;
				}
			}, bucketCount);

			// verify results
			for (int i = 0; i < histogram.length; i++) {
				if (histogram[i] != 1) {
					throw new IllegalStateException("incorrect number of matches (" + histogram[i] + ") for value " + i);
				}
			}
			Arrays.fill(histogram, 0);
		}

		System.out.println("OK");
	}

	public static <T> void iterateRandomly(List<T> list, Callback<T> callback, int bucketCount) {

		int bucketSize = list.size() / bucketCount;

		Random r = new Random();

		T[] bucket = (T[]) new Object[bucketSize];
		for (int k = 0; k < bucketCount; k++) {
			for (int i = 0, p = k; i < bucket.length; i++, p += bucketCount) {
				bucket[i] = list.get(p);
			}

			iterateRandomlyDestructively(bucket, bucket.length, callback, r);
		}

		if (list.size() % bucketCount != 0) {
			List<T> remainder = new ArrayList<T>();

			if (bucket.length == 0) {
				remainder.addAll(list);
			} else {
				for (int k = 0; k < bucketCount; k++) {
					for (int i = k + bucket.length * bucketCount; i < list.size(); i += bucketCount) {
						remainder.add(list.get(i));
					}
				}
			}

			Collections.shuffle(remainder);
			for (T item : remainder) {
				callback.callback(item);
			}
		}
	}

	public static <T> void iterateRandomlyDestructively(T[] bucket, int len, Callback<T> callback, Random r) {
		for (int i = 0; i < len; i++) {
			int p = i + r.nextInt(len - i);
			callback.callback(bucket[p]);
			bucket[p] = bucket[i];
		}
	}
}

davedes · June 14, 2012, 11:11am

How often are you changing your arrays? What are you doing? Can you read/write to file and “chunk”/stream data so you aren’t working with everything in memory?

moogie · June 14, 2012, 11:18am

wow, thanks for all the interest. I will have to take time to digest it all

I shall elaborate why I need to perform random iteration

I have a fixed length input bit stream over which i run an algorithm to find the smallest unique symbol length S (in bits) that when the stream is split in to S bit symbols no symbol is repeated. This S value is used later on in other algorithms.

However these algorithms need S to be 64 bits or less. Normally the bit stream is a compressed data stream so S is < 64 bit but it can occur that uncompressed streams, or streams with uncompressed portions are input. In that case S is likely to be > 64. So to help reduce the impact of such streams I want to pre-process the input stream to reduce the apparent large string of repeated bits.

This pre-processing is envisioned to be a simple XOR for each S bit blocks of the input stream with a random iteration of an imaginary array of sequential long integers. Imaginary as the index into the array is the same as the value in the array. This process is then reversed at a later stage.

The sequential imaginary array is to be used so that the original input stream bit probabilities are not affected by the pre-processing.

moogie · June 14, 2012, 1:38pm

My hunch seems correct, if the interval is a prime number then a modified version of the algorithm I stated seems to work for all the testing I have performed. I will still look at the other alternatives suggested to see which gives the best speed / memory usage / randomness compromise.



abstract class RandomIterator {

	public int getStartIndex() {
		return startIndex;
	}

	public int getInterval() {
		return interval;
	}

	public int getCurrIndex() {
		return currIndex;
	}

	private final int startIndex;
	private final int interval;
	private final List<?> collection;
	private final int collectionSize;
	private int currIndex;

	public RandomIterator(List<?> collection, int interval,Random random) {
		this.collection = collection;
		collectionSize = collection.size();
		this.interval=interval;
		startIndex = random.nextInt(collection.size());
		currIndex = startIndex;
	}

	public Object getNext() {
		Object o = collection.get(currIndex);

		currIndex = (interval - (collectionSize - currIndex)) % collectionSize;

		return o;
	}

}

class SemiRandomIteratorPrimeStep extends RandomIterator {


	public SemiRandomIteratorPrimeStep(List<?> collection, Random random) {
		super(collection,createInterval(collection.size(),random),random);

	}
	
	private static int createInterval(int collectionSize, Random random)
	{

		final int bitLength = 24;

		return BigInteger.probablePrime(bitLength, random).intValue();

	}
}

public class RandomIteratorTest {
	/**
	 * @param args
	 */
	public static void main(String[] args) {
		final int listSize = 86886;
		final int iterations = 100000;
		ArrayList<Object> list = new ArrayList<Object>(listSize);
		for (int i = 0; i < listSize; i++) {
			list.add(new Object());
		}

		HashSet<Object> seenSet = new HashSet<Object>(list.size());
		

		for (int j = 0; j < iterations; j++) {
			seenSet.clear();
			RandomIterator semiRandomIterator = new SemiRandomIteratorPrimeStep(
					list, new Random(j));

			for (int i = 0; i < list.size(); i++) {
				Object obj = semiRandomIterator.getNext();
				if (seenSet.contains(obj)) {
					System.out.println("Collision. Index: "
							+ semiRandomIterator.getCurrIndex() + " interval: "
							+ semiRandomIterator.getInterval()
							+ " startIndex: "
							+ semiRandomIterator.getStartIndex() + " seed: "
							+ j);
					System.exit(1);
				}
				seenSet.add(obj);
			}
		}
		System.out.println("no collisions.");
	}
}

Roquen · June 14, 2012, 3:49pm

Well if randomness is really important, then you can use a linear congruent generator…the trick is that you have to compute the constants each time the length changes, which boils down to performing an extended GCD. Generating the sequence of values would require a modulo…but walking random memory should hide the cost of the divide.

sproingie · June 14, 2012, 4:31pm

The period of an RNG (which includes a LCG) is the interval for which it doesn’t repeat its entire sequence of pseudorandom numbers. There’s no way it keeps from repeating the same individual number in its codomain, because that would be the complete opposite of random. It’d be like a coin toss that always alternated heads and tails.

Yeesh, just walk the list and swap randomly. If you need to keep the original list, swap the indexes. Or do what the database kids do and sort by random() (but that’ll hurt doing it with 260+ million elements unless you go mapreduce)