Nah, the profiling results look almost identical…
Okay, I hate to admit this, but I derped HARD.
This algorithm is a LOT faster than I originally thought.
This is going to make make look like a total idiot, but I wrote a program to test the speed without rendering… and it gets like 900 cycles a second.
Now, I was able to improve the rendered one by quite a bit by changing the x and y.
But yeah… I’m dumb. Now what I need to is 1. figure out how to speed up my rendering, and 2. figure out how to implement hashlife.
Lol I was writing my findings: 7x faster, 190x less memory usage :point:
I knew there was no way the buffer swap was slower…
Note I didn’t ever render anything or check validity, but here’s the code I wound up with if you’re curious: http://pastebin.java-gaming.org/d79cd49133917
See that’s why I was rendering: to check validity!
Dumb me. Anyway, checks out ur code
WHAT THE FREAK? How is it so fast?!?
EDIT: well, I see why it’s so fast: it doesn’t work! I tried rendering it… and… yeah, it doesn’t work.
Mmm? Why doesn’t it work? Is the interchange in set() not valid?
Let me investigate…
Eh, I can’t see what’s wrong…
but…
everything just fades out after a few cyces…
I’m actually really confused… I can’t see why your code isn’t working…
I figured it out.
you can’t do this:
newGrid[idx] = 0;
in the loop. I put Arrays.fill and it works.
BUT: 512x512? pfft. Try 1600x900!
At that size, this doesn’t seem much different than my original code…
EDIT: with a little testing, comparing your code to mine, with a grid size of 2000x2000, it seems that they perform NEARLY the same…
I figured it out too, but fill() slows it down a good bit, trying to find a better way.
What about that memory usage though? Can’t go around churning >1Gb heaps for no reason. :point:
This is true…
So we’ve made little progress overall really… I mean, better memory use is good but it’s basically the same speed still
Progress is progress I suppose.
Does this help?
private static void fill(final byte[] array, final byte value)
{
final int len = array.length;
if (len > 0)
array[0] = value;
for (int i = 1; i < len; i += i)
{
System.arraycopy(array, 0, array, i, ((len - i) < i) ? (len - i) : i);
}
}
Doesn’t seem to make much difference for me… er… maybe it’s a bit slower actually… probably due to the JNI calls from arraycopy…
Anyway, I found that online and thought I would give it a shot…
Best I have: it seems removing the if(value != 0) allows removal of zeroing as well.
Reaches the same fixpoint in my random board test at least, so it appears valid, and is about 1.75x faster than with fill().
I’m done for the night, but I’ll leave this: http://cell-auto.com/optimisation/
EDIT: also this, specific to Conway’s: http://stackoverflow.com/questions/40485/optimizing-conways-game-of-life
Cool cool. Thanks a lot for the help! This has been fun.
Use a shader :persecutioncomplex:
-ClaasJG
Ew, all that unnecessary branching (e.g. in setBit )
Clear the bit, then OR it with the desired value. No need for a branch!
Also, while it’s probably not your performance bottleneck, it’s good practice to only treat your bitsets as bytes when you’re putting them into your byte[], in all other cases handle them as ints.
Otherwise the compiler will end up adding implicit casts around operators. (as Java only has operator bytecodes for integer types, not byte types)
My new version: http://pastebin.java-gaming.org/9cd9369173b16 EDIT: corrected a slice bounds bug in parallelUpdate
At the bottom you can change the update method used in cycle() to see speeds, and in main() you can select either benchmark mode or display.
For serial and parallel update, it uses a sliding-window shift register technique (adapted from here) so as to only do a minimal number of reads. It also uses plenty of loop peeling so as to eliminate the wrapping computation which otherwise has to be done everywhere, as in naiveUpdate().
Even so, serialUpdate() is not much faster than the improved version of yours, fortunately this version is very amendable to parallelization.
The next step from here without moving to the GPU would be only storing and updating cells that have changed, etc. and then onto hashing spacial and temporal redundancies a-la hashlife.