Iterate each element of a collection randomly once only

[quote]My hunch seems correct, if the interval is a prime number then a modified version of the algorithm I stated seems to work for all the testing I have performed. I will still look at the other alternatives suggested to see which gives the best speed / memory usage / randomness compromise.
[/quote]
The interval and length of array have to be relatively prime.

So if the interval is prime, then it also has to not be a factor of the array length. An easy way would be to make the array length a power of two then use any prime number >= 3 as the interval.

Don’t worry, I will be performing some benchmarking to see whether the iterate and swap is acceptable :slight_smile: and that applies to all suggestions :wink:

Perhaps a little more clarification can help understand the problem… In my pervious post I defined S as symbol size in bits. Since the bit steam split into S sized chunks. The count of chunks is going to be less than 2^S. the preprocessing array (imaginary or not) will still need to be of 2^S length even if most elements are not used. So having that array is a bit wasteful and potentially slower due to random memory access than a mor computationally expensive algorithm that generates a non repeating random sequence of indecies into an imaginary array.

@sproingie: A full-period LCG with period P must visit each element of set S = [0,P) exactly once per period…otherwise it could not be full period.

@Davedes: Thanks for the links. From reading those articles, I think my algorithm is a form of Linear Congruential Generator.

I have created a simple benchmark using two methods: Riven’s Bucket method and my Prime Interval Stepping (PIS) Method. Source at http://www.java-gaming.org/?action=pastebin&id=120

From this simple benchmark, for an array size of 2^20, the PIS method seems to be about 5x faster than the Bucket method when utilising the random indecies sequence only as i will ultimately be doing for my situation.

@UprightPath: can you give a little more info about the constructor parameters of your implementation? I would love to add it to the benchmark but am unsure what values to put in for the “range” and “rangeStart” parameters.

Edit: forgot a class in the paste bin dump

Yes, comments would be helpful, wouldn’t they?


public SemiRandomIteratorObject(List<?> collection, int range, int rangeStart, int step) {
      this.collection = collection;
      this.range = range;
      this.rangeStart = rangeStart;
      this.rangeEnd = this.rangeStart + range;
      this.startIndex = this.rangeStart + random.nextInt(range);
      this.currIndex = startIndex;
      this.currLoopIndex = this.currIndex;
      this.step = step;
      this.up = random.nextBoolean();
      this.move = random.nextBoolean();
   }

collection: The List in question, instead of writing it as a purely array based, I decided to use lists. Anyway, it’d be fairly easy to change it out.
rangeStart: The index at which you’d like to start the random iteration.
range: The number of elements, from rangeStart to rangeStart + range, you’d like to iterate through.
step: This is just some number that is a factor of range.

Just for fun, for power-of-two only:


/**
 * Computes permuted (random) indices on [0,2<sup>n</sup>) visiting each element exactly once.
 */
public class PermutedSequencePOT
{ 
  private static java.util.Random rng = new java.util.Random();
  
  private int m;
  private int a;
  private int mask;
  private int v;

  public PermutedSequencePOT(int length)
  {
    setLength(length);
  }

  /** Set the length of the sequence.  Must be a power-of-two.  */
  public void setLength(int length)
  {
    mask = length-1;
    reset();
  }

  /** Create a new permutation of the current length. */
  public void reset()
  {
    // this is all overkill
    m = (rng.nextInt()<<3)|5;   // anything such that (m & 7)==5 
    a = (rng.nextInt()<<1)|1;   // any odd value
    v = (rng.nextInt());        // basically sets the offset that zero will appear
  }


  /** Returns the next index. */
  public int next()
  {
    v = (m*v + a) & mask;

    return v;
  }
}

It seems we have a new winner with Roquen’s POT Permutaed Sequence implementation in the speed category :slight_smile:

On my rig with 100 iterations with a collection size of 2^20 the time taken (in milliseconds:)

Prime Step time taken = 1755
Bucket Riven time taken = 10351
Power of Two Perturbed Sequence by Roquen time taken = 393
Directional by UprightPath time taken = 974

even in the worst case scenario (collection size of 2^20 + 1 ) the POT method still wins:

Prime Step time taken = 1751
Bucket Riven time taken = 10196
Power of Two Perturbed Sequence by Roquen time taken = 1228
Directional by UprightPath time taken = 977

Directional implementation is now the speediest! in this worst case for POT.

Code for the testing can be found at: http://www.java-gaming.org/?action=pastebin&id=125

@UprightPath: thanks for the definitions :slight_smile: Unfortunately I think I might have stuffed up integrating your code (i get infinite recursion) … Can you have a look over my extensions to see what i did incorrectly? My main changes was always use the collection size /2 as the step. in cases where the collection size is not even, i assume that the collection size is +1 and perform a check to see whether the next “random” index is >= the collection size, if so then ignore it and get the next “random” index. The range is always the size of the Collection (or +1 if odd). The code is included in the above link.

Edit: Updated table to include Directional by UprightPath times

I found a handly online randomness testing website: http://www.cacert.at/cgi-bin/rngresults

here are the results of the three methods:


Generator         VendorSpeed PriceEntropy (->Birthday SpacinMatrix Rank6x8 Matrix RankMinimum Distance Test  Random Spheres TestThe Sqeeze TestOverlapping Sums Tes
Prime Step 23bit          B/s  USD    7.932087              0      0.686              0                      0           0.022763              0                   0
bucket shuffle 23bit      B/s  USD    7.999994       0.021025      0.992          0.407               0.787081           0.400642       0.098716            0.000724
ROT 23bit                 B/s  USD    7.936804              0      0.224          0.004                      0           0.472286              0                   0
java random 23bit         B/s  USD    7.999993       0.483173       0.63          0.012               0.674221            0.83216       0.881283             0.00028

The java random 23bit was used to get a reference… it is simply calling Random.nextInt(1<<23) 2^23 times.

These results are not readily understandable other than the order of increasing “randomness” is Prime Step, Power of Twp Permutation, Bucket Shuffle.

This is quite apparent when converting the output each implementation into a 4096x4096 image when given an array of a 24bit sequential integers.

I think Roquen’s implementation suites my needs the best. Thanks :slight_smile:

Edit: updated to include UprightPath’s Directional Implementation

I looked at what you had and solved the issues that you were having (My original code was written with the intent to be able to pick a subset of a list to perform the function on, rather than always performing it on the whole collection, fixing that…)

Here you go: http://www.java-gaming.org/?action=pastebin&id=126

Hmm, never mind. I just messed around with it using the whole 4096x4096 and the method, as I defined it, doesn’t really scramble the list very well.

Anyway, glad you found you answer.

Thanks for that :slight_smile:

I have updated the table of results above to include your implementation. The speed is very nice! However I think you are correct… it’s apparent randomness is probably not it’s strong suit :frowning:

I had another version that I was toying with that put in an extra layer there that would increase the randomness at least slightly, however I’m not certain how much more it would do or how much more memory it would take it up. I could mess around and see, but… Eh.

If you want to play with it then have some fun with it :slight_smile:

What is interesting is your solution is the fastest in the worst case scenario for POT (see modified table in previous post) well done!