Calculating probabilities - is this the right way?

Hi All,

I’m trying to fill up a grid of cells with random objects, but certain objects need to have a higher chance of generating than others.

So, imagining you have a 5x5 grid of cells, I could say that I want 50% of the cells to be empty, and the other 50% to contain something (potentially).

I made a little diagram showing kind of how the distribution of values works:

Here’s where I get confused.

When I write the code to determine whether or not a cell is empty or not, I can do something like this:


int chanceForSomething = 50;

Random rnd = new Random();
if (rnd.nextInt(100) < chanceForSomething)
{
    // cell isn't empty, generate something here!
}


But what confuses me is how I then further determine what to generate, based on their individual chances of generating? Looking at the percentage graph above, I can see how the lower 50% could be broken up into their own percentages (20% becomes a 40% chance, 5% becomes 10% etc.). However, if I were to do a calculation like this:


int chanceForHuge = 40;
int chanceForLarge = 30;
int chanceForMedium = 10;
int chanceForSmall = 10;
int chanceForTiny = 10;

Random rnd = new Random();

int rndVal = rnd.nextInt(100);  // how does this distribute against the chance probabilities?


How do I work out what chance probability ‘rndVal’ matches? Am I thinking about this all wrong?

I want larger objects to spawn more frequently than smaller ones, so they should have a higher chance of being created. But I also need potential room to add more objects to this list of probabilities in future, without having to resort to ‘baking’ the percentage chance check with code like ‘if (chance >= 5 && chance < 10) …’. That seems like an unnecessarily ‘dirty’ way of handling this.

Any helpful insight into this would be greatly appreciated. Thanks!

Maybe some sort of table approach?


Map<Integer, ThingGenerator> table = new LinkedHashMap<>();
table.put( 10, new ThingGenerator( 1 ) );       // 0..10% for tiny
table.put( 20, new ThingGenerator( 3 ) );       // 10..20% for small
...   // etc

ThingGenerator getGenerator() {
    int rndValue = ...
    for( Entry<Integer, ThingGenerator> entry : table.entrySet() ) {
        if( entry.getValue() < rndVal ) return entry.getValue();
    }
    return null;   // Empty thing
}

...

foreach cell {
    ThingGenerator g = getGenerator();
    if( g != null ) {
        cell = g.create();
    }
}


The percentile values would have to be ascending for this to work, you might want to create your own class that enforced that rule.

I don’t know how you were going to handle each type of thing, the above uses a sort of factory that takes the size of the cell, but you might have a different approach in mind.

Is this just an example? Not enough information to give a real response.

The various AliasMethod classes are arbitrary distributions.

This is my generic approach to do those things: https://github.com/tomvangreen/generation/blob/master/ch.digitalmeat.generation/src/ch/digitalmeat/generation/data/WeightedList.java

Create a weighted list object of the type you want to create, add all possible values with their respective weight. When you want to create an item just call list.get(random).

atombrot: You can improve on the linear search, see RandomSelect (binary search) in the same directory as my above link. Alias method does the same thing without searching. Generates two random numbers and performs 1 or 2 table lookups (side by side in memory in the flat and fixedpoint versions) Vose’s method is optimal for fixed arbitrary distributions.

Here’s an example:


import roquen.math.rng.*;
import roquen.math.rng.discrete.AliasMethodFixedPoint;

public class ExampleHack {
  
  public static void main(String[] args)
  {
    PRNG rng = new XorStar64();  // whatever RNG
    AliasMethodFixedPoint dist;
    
    // integer table is simply a list of weights in order
    // for the desired result..so a weight of 50 for 0, 40 for 1,
    // etc.  The probability of each will be the weight/sum(weights)
    dist = AliasMethodFixedPoint.make(new int[] {50,40,30,10,5,5});
    
    for(int y=0; y<5; y++) {
      for(int x=0; x<5; x++) {
        int r = dist.nextInt(rng);
        if (r == 0)
          System.out.print(".");
        else
          System.out.print(r);
      }
      System.out.println();
    }
  }
}

There’s a reasonable chance you won’t be happy with method.

All of these methods are actually very good - I can already see, based on the examples, how they would solve my problem.

This would also remove the need for me to do any initial probability testing on the empty vs. something check, since that’s accounted for in the probability weighting itself (where the largest probability is nothing and all subsequent probabilities represent discrete entities).

Thanks! This was really helpful!

Thanks for the hint. I know my method is in no way optimized :wink: I’ll have a look at your link this weekend. I always wondered how to improve that, but as I only use it in non performance critical places I never bothered to change it…

Okay, so this method is probably a bit ugly, but it works great:

I start by defining the probability of each cell type. For simplicity’s sake, everything is a percentage, with the sum of all values adding up to 100:


int chanceForEmpty = 70;     // type 0
int chanceForBig = 10;       // type 1
int chanceForMedium = 15;    // type 2
int chanceForSmall = 5;      // type 3

Then I run a loop for each chance variable, adding it into a values array as many times as it’s needed. This array will contain all of the chance values:


// get the sum of all values
int valueSum = chanceForEmpty + chanceForBig + chanceForMedium + chanceForSmall;

// set a values look-up array with the same dimension
int[] values = new int[valueSum];

// add each chance value
int iteration = 0;
for (int i = 0; i < chanceForEmpty; i++) { values[iteration] = 0; iteration++; }    // 0 = EMPTY
for (int i = 0; i < chanceForBig; i++) { values[iteration] = 1; iteration++; }      // 1 = BIG
for (int i = 0; i < chanceForMedium; i++) { values[iteration] = 2; iteration++; }   // 2 = MEDIUM
for (int i = 0; i < chanceForSmall; i++) { values[iteration] = 3; iteration++; }    // 3 = SMALL


Now, when I want to get a random, weighted type for a particular cell, I can call up a random value in the look-up array, which provides the weighting / probabilities I need:


Random rnd = new Random();
int index = rnd.nextInt(values.length);
int randomType = values[index];    // 0 = Empty, 1 = Big, 2 = Medium, 3 = Small

I know the more talented guys and gals out there think this is probably ugly. I accept that this is only really useful in situations where the resulting pre-computed look-up array is fairly small. Mine is only 100 elements in size, but much larger sets could have memory footprint issues. But go gentle, this is my first attempt at handling weighted random selection :slight_smile:

There’s also the fact that this is precomputed, so obviously there would be additional issues on massive sets whilst everything is being preloaded into the look-up array. But again, my array size is small enough to be fairly inconsequential.

Here’s my summary of what I’ve learned from this experiment:

PROS:

  • Probably the easiest-to-understand method of accomplishing this that I’ve reviewed thus far.
  • Quick and easy to code from the ground-up
  • Since everything is precomputed, on-the-fly random selection requires no more work that fetching a value from memory.

CONS:

  • Probably an ugly way to solve this!
  • Adding a new weighted value to the set requires a little bit of coding (new for loop)
  • Requires some re-calculation of how it affects the probabilities of other items in the set
  • Not suitable for massive datasets (potentially memory-hungry, requires pre-computation).

I hope someone else finds this useful.

Thanks everybody for your help on this today!

There is a subtle, but serious flaw with your method. :emo: While you are guaranteed that your initial values array is filled with the desired item distribution, the way you select your items means that the output isn’t guaranteed to share that same trait.

Assume we have an array consisting of 2 elements which can represent one of two desired values, 1 or 0 (just to keep it simple). Also assume that we have a desired distribution of 50% (basically one element is equal to 1 and the other is equal to 0). For example:


int[] values = new int[2];
values[0] = 0;
values[1] = 1;
Random which = new Random();
for(int x = 0; x < values.length; ++x) {
    System.out.println("Value is " + values[which.nextInt(2)]);
}

If you run the code a few times you’ll notice that you don’t get a consistent output distribution of 50%. You are just as likely to get 0,0 or 1,1 as you are to get the desired output of 0,1 or 1,0.

To make this approach work, you have to remove the values from the list as they are chosen. Something like the following.


int sel;
Integer val;
ArrayList<Integer> values = new ArrayList<Integer>(2);
values.add(0);
values.add(1);
Random which = new Random();
while(!values.isEmpty()) {
    sel = which.nextInt(values.size());
    val = values.get(sel);
    values.remove(sel);
    System.out.println("Value is " + val);
}

While I can’t say the above code is the most efficient (I threw it together without much thought), I can say that it will always provide the desired distribution of outputs. 8)

ORLY? Where and what is the bug?


    int[] histogram = new int[4];
    int len = 0xFFFFFFF;
    
    for (int i=0; i<len; i++)
      histogram[values[rng.nextInt(values.length)]]++;
  
    for(int i=0; i<4; i++)
      System.out.println(1.0*histogram[i]*(1.0/len));

I get: 0.7000058878213387 0.10000795908275231 0.1499699583275987 0.05001619476831032

Here is a more flexible solution.
It has two advantages over many other proposals.
You can list the elements in any arbitrary order and you can easily add e.g. some slider controls to change the probabilities while the program is running, because you don’t put the numbers directly into the list but instead you put a function in there that can reference e.g. a JSlider or some other source of numbers.


Supplier[] probabilities = {
    () -> 50, () -> new A(),
    () -> 10, () -> new B(),
    () -> 35, () -> new C(),
    () ->  5, () -> new D()
};
RandomGenerator rg = new RandomGenerator(probabilities);
HashMap<Class, Integer> map = new HashMap<>();

for(int i = 0; i < 100000; i++) {
    Object o = rg.next();
    map.merge(o.getClass(), 1, (oldNum, x) -> oldNum + 1);
}

System.out.println(map);

Here is the rest of the code
http://pastebin.com/xP1EMLBR

If this was a reply to my post, then the bug is in the code pixelprime posted (not really sure why you’re posting a totally different piece of code and asking where the bug is ???). If it wasn’t a response to my post, then I apologize and blame it on a lack of caffeine in my system. :point:

Which is exactly the behaviour we want. So no bug here. If you always got 0,1 or 1,0 it wouldn’t be very random.

You’re suffering from the Gambler’s Fallacy.

What Roquen did was run a test using the values[] array from Pixelprime’s example, which randomly computes 268 435 455 values and stores the frequency of each result.

Compare that with the probabilities of each value that Pixelprime set:

Also, I think most RPGs have an even simpler system than this. They simply have a list of N items that an enemy can drop when it dies and the probability of each one of them, and simply compute one random number for each possible drop. Otherwise you can’t get more than one item per killed enemy.

Not exactly. The gamblers fallacy comes down to not being able to predict future outcomes based on current outcomes. The method I proposed is actually the opposite. It absolutely guarantees future outcomes based on current and past outcomes.

I think it may be that Roquen and I could be looking at the requirements a different way. His approach does indeed average out over the long haul (268,435,455 iterations in his example). When I read this part of the original post…

…I assumed he wanted a guaranteed distribution over a fixed number of steps which Roquen’s method does not accomplish (especially with a low number of iterations). I’m perfectly willing to accept that it was a misunderstanding on my part though. :wink:

Ah, indeed. Seems like me and Roquen simply interpreted the requirements differently.

There are simply two ways. I don’t know the terms in English, conceptually it was picking (colored) marbles from a vase, with or without putting them back after each pick - using nPr and nCr respectively. OP seems to be after not putting them back, which as CodeHead says would change the odds after each picking.

Looking at “all” of OPs original post what he’s asking for is a discrete prob. dist function. I made a nod to that perhaps not being what he’ll ultimately want in my original code example post. For things like this (filling space) you probably want some proc. gen method instead which is coherent in space instead of independent random events regardless of technique.

On discrete prob. functions I tossed together some example code for linear & binary search for learning purposes. A stand-alone class that (once I get motivated) is intended to supplement a plain English description that I’ll stick on the wiki. The plain language description will be much easier to understand.

PDF = probability density function (directly used for linear search)
CDF = cumulative distribution function (directly used for binary search, effectively computed for linear search)


Again, in practice if the distribution is fixed (or varies slowly) Vose’s method is optimal…don’t bother re-inventing broken wheels.