Generating unique random numbers.

sws26 · August 1, 2011, 4:59am

My current project involves a lot of procedural generation and I can’t come up with a good way of generating a set of unique random numbers. Say I had to pick N random numbers between A and B with the guarantee of no duplicates how would you go about it? Keeping in mind N and B could be arbitrarily large discounting the inelegant obvious solution.

Scarzzurs · August 1, 2011, 7:18am

You could try generating a list with numbers A to B and drawing from it randomly (removing the element).

Or, you can pre-generate and shuffle the list in order to avoid the cost generating randomness later.
The shuffle method of the Collections class runs in linear time, so it should work even for large lists.

I suppose an ArrayList with predefined capacity would be a good choice for the storage in either case.

Hope these methods are adequate

Scarzzurs

philfrei · August 1, 2011, 7:58am

Generate random number, put it into a HashSet (an unordered collection that rejects duplicates). If it is allowed into the HashSet, it is unique, and thus can be used.

If it DOESN’T go in, you could just generate another and try again, but there’s no guarantee you will ever succeed in creating your set!

So, instead of that, try incrementing the number that collided and keep trying until you succeed. It seems to me a simple increment by 1 is OK, but it might be better to pick a large prime number that is not a factor of the size of the sample set, so that it jumps around more. The main thing is that whatever your increment, it should guarantee eventually touching each possible value. Then, as long as N is smaller than B-A, you are guaranteed of completing the task.

For example, if you sample set is 121, you wouldn’t want to use an increment of 11, but 7 or 13 would be fine.

Jono · August 1, 2011, 9:30am

It might be easier to first generate an ordered list of N unique numbers that are between A and B, and then take randomly from that (or shuffle it after it is generated).

The ordered list can then be thought of as N+1 “gaps” between each number, where each gap is >= 1 and the sum of the gaps is B-A. If you can work out what the probability of each gap size is then you could create the list recursively:


List gaps(int n, int a, int b){
  if(n == 1)
      return {random(b-a)};
  int nextGap = randomGap(n,a,b);
  return {nextGap} + gaps(n-1,a+nextGap,b);
}

I don’t know what the probability distribution of “randomGap” should be but I bet a statistician or wikipedia will.

nsigma · August 1, 2011, 9:52am

What inelegant obvious solution do you actually have in mind??? Just wondering if that was a HashSet from JDK because of object overhead, or the brute force approach, or something else? If HashSet you could consider philfrei’s post but use one of the various HashSet solutions that work directly with primitives.

sws26 · August 1, 2011, 5:22pm

Inelegant would be anything where you can’t predict how long the algorithm will run (such as re-picking on collisions). Generating the complete set from A to B, shuffling and selecting the first N would be fine for a small set but B can be arbitrarily large.

Riven · August 1, 2011, 5:40pm

You can easily implement two strategies:

If there are few possible values, use a list and shuffle it, if it’s big, use a set of values already returned.

pjt33 · August 1, 2011, 10:38pm

On the assumption that B >> N:

Take a self-balancing tree which gives a count of the size of each subtree. Store the numbers already picked in the tree. To generate a new number:

int x = A + rnd.nextInt(B - A + 1 - tree.size());
int skipped = 0, skip = 0;
while ((skip = tree.countLE(x)) > skipped) {
    x += skip - skipped;
    skipped = skip;
}
tree.add(x);
return x;

If you’re worried about long runs of taken numbers, you could complicate things even more…