Optimizing Conway's Game of Life

sci4me · July 24, 2015, 5:44am

Nah, the profiling results look almost identical…

sci4me · July 24, 2015, 6:02am

Okay, I hate to admit this, but I derped HARD.

This algorithm is a LOT faster than I originally thought.

This is going to make make look like a total idiot, but I wrote a program to test the speed without rendering… and it gets like 900 cycles a second.

Now, I was able to improve the rendered one by quite a bit by changing the x and y.

But yeah… I’m dumb. Now what I need to is 1. figure out how to speed up my rendering, and 2. figure out how to implement hashlife.

BurntPizza · July 24, 2015, 6:06am

Lol I was writing my findings: 7x faster, 190x less memory usage :point:
I knew there was no way the buffer swap was slower…

Note I didn’t ever render anything or check validity, but here’s the code I wound up with if you’re curious: http://pastebin.java-gaming.org/d79cd49133917

sci4me · July 24, 2015, 6:08am

See that’s why I was rendering: to check validity!

Dumb me. Anyway, checks out ur code

WHAT THE FREAK? How is it so fast?!?

EDIT: well, I see why it’s so fast: it doesn’t work! I tried rendering it… and… yeah, it doesn’t work.

BurntPizza · July 24, 2015, 6:18am

Mmm? Why doesn’t it work? Is the interchange in set() not valid?
Let me investigate…

sci4me · July 24, 2015, 6:19am

Eh, I can’t see what’s wrong…

but…

everything just fades out after a few cyces…

I’m actually really confused… I can’t see why your code isn’t working…

sci4me · July 24, 2015, 6:38am

I figured it out.

you can’t do this:


newGrid[idx] = 0;

in the loop. I put Arrays.fill and it works.

BUT: 512x512? pfft. Try 1600x900!

At that size, this doesn’t seem much different than my original code…

EDIT: with a little testing, comparing your code to mine, with a grid size of 2000x2000, it seems that they perform NEARLY the same…

BurntPizza · July 24, 2015, 6:45am

I figured it out too, but fill() slows it down a good bit, trying to find a better way.

What about that memory usage though? Can’t go around churning >1Gb heaps for no reason. :point:

sci4me · July 24, 2015, 6:47am

This is true…

So we’ve made little progress overall really… I mean, better memory use is good but it’s basically the same speed still

Progress is progress I suppose.

sci4me · July 24, 2015, 6:57am

Does this help?


private static void fill(final byte[] array, final byte value)
        {
            final int len = array.length;

            if (len > 0)
                array[0] = value;

            for (int i = 1; i < len; i += i)
            {
                System.arraycopy(array, 0, array, i, ((len - i) < i) ? (len - i) : i);
            }
        }

Doesn’t seem to make much difference for me… er… maybe it’s a bit slower actually… probably due to the JNI calls from arraycopy…

Anyway, I found that online and thought I would give it a shot…

BurntPizza · July 24, 2015, 7:01am

Best I have: it seems removing the if(value != 0) allows removal of zeroing as well.
Reaches the same fixpoint in my random board test at least, so it appears valid, and is about 1.75x faster than with fill().

I’m done for the night, but I’ll leave this: http://cell-auto.com/optimisation/
EDIT: also this, specific to Conway’s: http://stackoverflow.com/questions/40485/optimizing-conways-game-of-life

sci4me · July 24, 2015, 7:03am

Cool cool. Thanks a lot for the help! This has been fun.

ClaasJG · July 24, 2015, 9:00am

Use a shader :persecutioncomplex:

-ClaasJG

Abuse · July 24, 2015, 1:40pm

Ew, all that unnecessary branching (e.g. in setBit )
Clear the bit, then OR it with the desired value. No need for a branch!

Also, while it’s probably not your performance bottleneck, it’s good practice to only treat your bitsets as bytes when you’re putting them into your byte[], in all other cases handle them as ints.
Otherwise the compiler will end up adding implicit casts around operators. (as Java only has operator bytecodes for integer types, not byte types)

BurntPizza · July 24, 2015, 10:07pm

My new version: http://pastebin.java-gaming.org/9cd9369173b16 EDIT: corrected a slice bounds bug in parallelUpdate

At the bottom you can change the update method used in cycle() to see speeds, and in main() you can select either benchmark mode or display.

For serial and parallel update, it uses a sliding-window shift register technique (adapted from here) so as to only do a minimal number of reads. It also uses plenty of loop peeling so as to eliminate the wrapping computation which otherwise has to be done everywhere, as in naiveUpdate().
Even so, serialUpdate() is not much faster than the improved version of yours, fortunately this version is very amendable to parallelization.

The next step from here without moving to the GPU would be only storing and updating cells that have changed, etc. and then onto hashing spacial and temporal redundancies a-la hashlife.