Sharing data between threads

Hiya. Just for fun I was wanting to see what speed difference would come from doing Conway’s game of Life multi threaded vs single threaded. I’m using a naive implementation where I keep a front and back buffer and swap between these as I process. I didn’t want to implement any clever hashing or dead space detection since I’m interested in seeing the difference threading makes, not in implementing the fastest algorithm.

The results I got were these:

Single threaded: 28 million cells/sec
Multithreaded (4 cores): 40 million cells/sec

So that’s about a 30% increase. Now, I’ve never seen huge speed increases by multithreading and I think my CPU is pretty shit for multithreading (intel Q6700) so I wasn’t too surprised but there was one thing I was wondering about which I couldn’t find any definitive information about. The approach I’m taking is to get each thread to render 1/4 of the data. What I was wondering is whether there is any penalty getting each thread to write into the same array? I had assumed that as each thread writes to a different area in the array then that should not suffer any kind of slow down due to the threaded access. Is that correct?


If thread A writes into [icode]arr[i][/icode] and thread B reads from or writes into [icode]arr[i+1][/icode], odds are these values are in the same cache line (which are 64 bytes wide). At this point we get in the realm of false sharing, which you can google if you please. Simply put: try to have your threads not write into memory adjacent to memory that other threads access (reads or writes).

Sooo… does that mean the best thing to do is to have each thread having a local buffer for their own portion of the work and copy that to the render thread when complete?

Memory copies are exceptionally expensive.

As for your case, simply try to have each thread work on data in a set of rows in the grid, not interleaved columns, which would be worst case, obviously). Each row should have its data in a (segment of a) primitive array, say a boolean[] or even a BitSet (which is 8x as memory efficient, although it introduces some computational overhead)

As for whether a 30% performance increase is OK: multi threading introduces some overhead, so it won’t help much in small data sets (or few calculations). Try to increase (and later decrease) the grid with factor 4, and see whether that affects the performance gains of multi threading.

I am doing the thread’s work in a contiguous hunk of rows already. I tried increasing the image size and as you said, that increased the improvement from threading.

Thanks for your advice!

Indeed, writing to the same cache line from two different threads causes a cache flush on pretty much each write. I’ve had code that were reduced to zero scaling with 8 threads due to them all incrementing the same integer counter (was an old preformance measurement variable that wasn’t even synchronized). I removed that single x++ and scaling went up to 3.5x on 4 Hyperthreaded cores with 8 threads.