Bag and ArrayList Testing... Memory Error?

Hey, everybody!

I discovered Kappa’s Bag code and decided to put it some use against an ArrayList. So far, I’m very happy with the results! The code for his data structure, if you haven’t seen it yet, is here:

I wrote a little test program to test just simple functionality out of it here (not a realistic example, by any means, but just something for some quick benchmarks, I suppose):


package test;

import java.util.ArrayList;
import java.util.Random;
import util.Bag;

public class BagTest 
{
    public static void main(String args[])
    {
        Bag stringBag = new Bag();
        ArrayList stringList = new ArrayList();
        Random r = new Random();
        
        long time = System.currentTimeMillis();
        
        for (int i = 0; i < 1000000; i++)
        {
            stringBag.add(r.nextInt(2) == 1 ? "hello" : "goodbye");
        }
        
        System.out.println("Adding to the Bag: " + ((System.currentTimeMillis() - time) / 1000.0) + "s");
        
        time = System.currentTimeMillis();
        
        for (int i = 0; i < 1000000; i++)
        {
            stringList.add(r.nextInt(2) == 1 ? "hello" : "goodbye");
        }
        
        System.out.println("Adding to the ArrayList: " + ((System.currentTimeMillis() - time) / 1000.0) + "s");
        
        time = System.currentTimeMillis();
        
        while (!stringBag.isEmpty())
        {
            int remover = r.nextInt(stringBag.size());
            String s = (String)stringBag.get(remover);
            
            if (s.equals("goodbye"))
            {
                stringBag.remove(remover);
            }
            else
            {
                stringBag.remove(remover);
                stringBag.add("goodbye");
            }
        }
        
        System.out.println("Removing from the Bag: " + ((System.currentTimeMillis() - time) / 1000.0) + "s");
        
        time = System.currentTimeMillis();
        
        while (!stringList.isEmpty())
        {
            int remover = r.nextInt(stringList.size());
            String s = (String)stringList.get(remover);
            
            if (s.equals("goodbye"))
            {
                stringList.remove(remover);
            }
            else
            {
                stringList.remove(remover);
                stringList.add("goodbye");
            }
        }
        
        System.out.println("Removing from the ArrayList: " + ((System.currentTimeMillis() - time) / 1000.0) + "s");
    }
}

It runs fine with test cases up to 100000 entries (during which, by the way, Bag seems to always take the cake with this test), but when I try to add on another factor of 10 by raising the count to 1000000 (I haven’t tested it in between yet), the last bit where it counts ArrayList removing entries never appears, as if the app has gone into some sort of infinite loop or something without any error messages. I’m not entirely sure why this is, so I thought I’d ask to see if you guys did!

Btw, I did make sure to set -Xss256m and -Xms256m in case this was the issue, but this didn’t help either.

Thanks a bunch!

Best regards,
Colton

You’re doing random remove()s, and they scale very badly with longer lists. It’s probably just really slow.

Well first of all you should debug your stuff before jumping into conclusion that it is an inifinite loop or a memory error (?).

Have you debugged to see if the size of the list is decreasing?

Secondly, you’re putting a random factor in your benchmark for adding two types of elements (“hello” and “goodbye”) which will directly affect the outcome of the remove phase. In the ArrayList run you might have 2 times more removes than the Bag removes, so you cannot conclude anything with that.

If you want to benchmark something against something else, they should operate on the exact same data, not on completely different sets!
With this random set benchmarking you might (and probably will) get different results even if you declared both lists as a Bag!

And finally, if you’re measuring the add and remove, your loop should contain ONLY that, not a call to a random function which probably is more expensive than an add().
Let’s assume your whole loop spends 1000ms in the Random function, and 200ms in the add() for ArrayList and 100ms for the add in the Bag.
You’ll conclude that ArrayList performs 1100/1200=91% of Bag, when you should be concluding that it performs 100/200=50% comparing to Bag.

Microbenching is hard. No offense intended, but everything is wrong. :slight_smile:

You’re right; I probably should have taken all of those other factors into consideration! However, I wanted to also check randomness as part of their performance, because I would be using them in scenarios involving randomization.

EDIT: Another thing is that both the Bag and the ArrayList are given the same set of instructions, and yet the Bag will always finish all of its removes in less than a tenth of a second. I did some outputs on the ArrayList removes and it takes it an astronomical amount of time for it to do essentially the same thing, consistently; in fact, I haven’t had the patience to let it complete a full run yet, but it takes many minutes.

No offense taken; I have much to learn! :smiley:

Colton

Awesome.