Multithreaded collision detection

theagentd · November 28, 2012, 2:11pm

Hello.

In an attempt to promote my threading library I made a small project which simulates a 2D ball park, Ball Ocean. The program simulates thousands of small balls and does collision detection between each ball and with the screen edges. The collision detection and movement updating is fully threaded and scales very well with the number of cores. The updating can be done both with my library and an optimized single-threaded version without any synchronization for a fair comparison.

First I planned out my game.

I need to find potential collision pairs quickly, so I decided to use a grid to be able to quickly find nearby balls.
I need to update my balls position based on their current velocity, and also update their location in the grid if they move over a tile border.
I need to check for collisions between balls, and add a force to them when they collide.
It has to have exactly the same result as a single-threaded implementation, e.g. be fully deterministic.

I then split it up into 2 main tasks:

Task 1:
Update position based on velocity.
Check collision against screen edges.
Check if the tile is still in the correct grid tile and move it to the correct tile if it has moved out of its old tile.

Task 2:
For each circle, query the grid for nearby circles and check for collisions against them.
When a collision is found, add some velocity to it to bounce it away from what it collided with.

All of these are fully threadable except for one thing: Updating the grid. When a ball has moved out of its current tile and we need to update the grid, we need to remove the ball from its old tile and add it to it’s new tile. This is not thread-safe, and even if we use synchronization we won’t get any guarantees on which order the balls will end up in inside each tile. We therefore break out that part of the task and run that single-threaded to guarantee that they have the correct order and avoid synchronization. This led to the following implementation of two tasks: UpdateTask and CollisionTask.

UpdateTask first loops over each ball and updates its position using its current velocity. It then checks for collisions against the screen edges and bounces them off of them. It also checks if a ball has moved outside its current tile. If it has, it adds the ball to a thread local list (there’s one list per thread). When the multithreaded updating is done, it updates the grid location of all balls which had moved outside their tiles in the finish() method of the task which is called from a single thread after the task is done.

CollisionTask is run after UpdateTask has finished and queries the grid for nearby circles. It then checks for collisions against each circle that the grid callback returns and adds a force (= a speed) away from the other ball.

There are also some additional features: You can push away circles by holding down the mouse button (singlethreaded though). How many balls there are in a grid tile is visualized by a red color. Multithreading and rendering (to benchark raw CPU performance) can be toggled.

The grid-size and velocities are optimized for balls with a radius of 5 pixels right now. On a 1080p screen I can fit 45 000 balls which stack up and fill around 95% of the screen. The following is the FPS I measured (with Fraps) after the balls had stabilized after a few seconds. My CPU is an i7 860 quad core with hyperthreading.

Rendering ON:
Single-threaded: 26 FPS
8 threads: 110 FPS
Scaling: 4,23x faster.

Rendering OFF (benchmarking pure CPU performance):
Single-treaded: 27 FPS
8 threads: 131 FPS
Scaling: 4.85x faster.

No, that’s not a typo. It’s actually almost 5x as fast on my quad core thanks to hyperthreading. ;D

You can download the source code here: http://www.mediafire.com/?gx8tk0ca887kkf7 The latest version of Small Java Threading Library is also included as a .jar which you’ll have to add to your classpath. You’ll also need LWJGL which is used for rendering (not included).

StumpyStrust · November 28, 2012, 5:39pm

After I looked at your geometry shader program one thing interested me more than anything else and that was how relatively easy it was to get multithreading to work. I got it to do the updating of particles multi threaded but do to the way my batcher was made I could not multi task it. Sadly, I saw no boost as that was not what was slowing the cpu down.

How exactly and were exactly could we incorporate this into a game? Wouldn’t it be best to have one spot that is multi threaded? Right now all game objects have an interface that allows me to add them to a huge list and update everything at once. Could I multi thread that? I am such a noob. Idk what I am talking about but I really like this. :

theagentd · November 28, 2012, 6:30pm

I adapted this code from a 3D RTS game I was making before but got tired of. It’s meant to handle the collision between units (and terrain) by treating them as circles on a 2D plane since that’s basically what they are. The code was fully ready to be multithreaded, but I never bothered to implement it. Then I thought that it was kind of sad that it never got used anywhere and saw a chance to show people what my library could do.

Not all parts of a game can be multithreaded. Even BallOcean is single threaded at times. You should only multithread things that benefit from it. For example multithreading input handling or other things that are really fast in the first place is just stupid. There are two times where it’s good to multithread for performance: When doing simple loops performing the same operation on lots of different objects or when you have several single-threaded tasks that are completely independent of each other.

The first one is to split up a single task into X number of tasks. This is used when we have a loop, but each iteration only writes to its own object. Examples include particle updating, game object movement, collision detection. Basically any task that actually is just a whole lot of smaller independent tasks. Since we’re making games here, we often have hundreds of objects, thousands of particles, thousands of objects we want to frustum cull, etc. The point of multithreading here is scalability. Since these tasks can be split up to a very large number of sub-tasks we can use as many cores as we want. Since performance scales almost linearly with the number of cores, we can linearly increase the number of particles, objects or whatever as the number of cores increases. Threaded games perform better on both current PC hardware, but the real aim is for the future. Performance per core isn’t following Moore’s Law anymore. We’ll be getting more and more cores from now on, but each core isn’t getting much faster.

The second one is kind of a “compatibility fallback”. Let’s say you’re using an algorithm that’s recursive (the result of a loop iteration depends on the previous iteration) or for any reason cannot be multithreaded into sub-tasks like above. In that case you may still have multiple tasks that can run in parallel. Your game object updating may be impossible to multithread for some reason, but they should still not have anything to do with particle updating, so by just by running those two in parallel we might get a lot better performance. It obviously won’t scale with more than two cores, and the work-load might be very unbalanced (millions of particles, but only 10 game objects or vice versa, so it’s practically single threaded anyway). This isn’t future-proof and won’t scale well, but it can still give a tangible performance improvement. Its biggest use is probably when using OpenGL, since OpenGL commands must done from a single thread.

Concurrent programming is the future. We wouldn’t be having games unless some smart people developed scalable algorithms that can run on thousands of cores to process vertices and pixels, AKA GPUs. CPUs are going the same way now. It’s time to adapt and innovate. The future holds a paradise of extreme amount of FLOPS that anyone stuck on a single thread will miss. The scale and detail must continue to get better!

EDIT: Multithreading a batcher sounds like the wrong approach. A batcher should simply store data about sprites/particles in a buffer and then draw all of them at the same time later. It’s completely RAM limited while your filling it, and won’t benefit much from multithreading. You want to multithread your sprites in the first place, and use one batcher per thread, similar to what I did for my particles. This obviously requires all your game object logic to be threadable, and that’s where the challenge lies.