Threading games?

Hello. I’m about to start making a 3D RTS game after having tested out many of the parts that I will need for it (fog of war, occlusion testing, etc etc). I’ve realized that many of the CPU-bound tasks are highly threadable. I have seen the performance benefits of going from 1 to 2 threads, for example splitting updating (CPU-heavy 2D physics) and rendering. Therefore I would like to make this game scalable to any number of processors to some extent.

The problem is that I have tasks with different priorities. Examples:

  • In a small test program I did frustum culling in a different thread in parallel to the game updating. There is no need to pause the game loop while the test is running, as one frame old culling results are perfectly fine. This gave a huge performance boost when CPU-bound, sometimes +100%. Code example:
level.processCullingResults(); //Wait for culling thread(s) to finish, if they are not done yet. Then apply the result to the level.
level.startFrustumCulling(); //Start the thread(s) again.
level.renderLevel(); //Render the level
Display.update(); //LWJGL buffer swap
  • Pathfinding is most likely extremely threadable. Considering the fact that multiple units will most likely receive move orders at the same time (a simple group move), pathfinding can be threaded per unit as long as the paths don’t affect each other (more advanced algorithms). Due to the tight client-server synchronization required, pathfinding will have to pause the game loop. This is a very nice thing to thread, as if a player orders 100+ units to move a long distancee to attack the enemy base, it could give a warning to the defending player as the game could freeze for a second or two due to pathfinding calculation. I’ve had that happen VS 7 CPU players in Warhammer 40k. xD

  • Some of my fog of war code is also threadable in some heavy preprocessing steps that have to be redone when the map changes (which happens quite a lot).

All of these steps could in theory use any number of threads to do their calculations. My current idea is to have this thread layout:

  • One high priority master/rendering thread. Helps as LWJGL only allows OpenGL calls from one thread. This thread controls all other worker threads.
  • X high priority worker threads, where X is the number of processors available. These threads will do calculations that are threadable but requires the master thread to be paused (pathfinding, fog of war, and more), and will therefore never compete with the master thread for resources.
  • 1 or more low priority frustum culling threads. These threads are set to low priority to reduce their effect on the master thread and the worker threads which are more time sensitive. They will however be given time to complete if necessary.
  • (Network threads and sound threads are easily implemented and won’t affect performance noticeably.)

Am I doing things right? I haven’t found many tutorials/documents on threading games in specific… I may be overdoing things with this intense multithreading, and I probably will cut corners for simplicity, but I believe this will make the game more smooth and professional looking. I’m not entirely sure if it’s worth it, but it’ll be a nice learning experience anyway. The target computer is a computer with a dual-core processor or better with an OpenGL 3.0 ( = DirectX 10) compatible graphics card, if that helps.

Any comment is appreciated! =) Surely someone has to have experience with threads in games! xD

I’m having some fun with Threads myself right now, working with audio objects that overlap playback.

Must reading:
http://download.oracle.com/javase/tutorial/essential/concurrency/index.html

I assume you’ve already seen that. It doesn’t say much about games in specific, but it does explain how to set up objects so they can interact efficiently in a multi-threaded environment.

I’d think it would be better to force a thread to sleep periodically than to lower its priority. I’m reading the Threads chapter in Sierra/Bates SCJP book, and they say you can’t count on two JVMs having similar thread switching algorithms. What is evenly balanced on one machine might starve certain threads on another implementation.

Thanks for your reply, Philfrei! I have indeed read bits and pieces of that tutorial, and it’s really good!
Concerning thread priority: Unreliable as hell. >_< I read some article that said that priorities were basically useless if all threads were competing for CPU allocation. What the hell?! How can a thread NOT compete for CPU allocation?! If it’s not competing, then the priority wouldn’t have any effect in the first place!
The problem with my use of thread priorities is that I WANT starvation of the culling threads, as I don’t want them to slow down the worker threads. I want them to run when the master thread is doing single-threaded stuff, and be completely paused when the worker threads kick in.
I have also slightly revised my thread layout. I will likely only use a single culling thread, as the culling work is pretty small anyways and doesn’t really need multiple threads. This will also reduce their impact on the worker threads.

Using ExecutorService to make a thread pool is also pretty cool. Making new threads repeatedly supposedly has some overhead which can be avoided if you instead take a premade one from a pool. I’m finding ExecutorService is pretty easy to use. (I learned about it via Core Java vol 1–Horstmann).

So, instead of lowering a priority, try putting it into a loop with a sleep value. Have it only wake up to do the culling action every 10 or 100 or whatever makes sense gameloops or msecs. Seems like one could use separate threads for opponent AI as well, as long as you are just reading gamestate and making “loosely coupled” updates to opponent state to direct their behavior. Opponents don’t have to be awake for every cycle, either. (Don’t want them to be too smart!)

Is there a way to tell from inspection or reflection (within the Java program) on the host computer, how many processors there are?

I’m looking in my few game books. “Killer Game Programming in Java” has a bit on multithreading in the beginning of Chapter 2. Hmmm. Not much, but my books are mostly from Half Price (e.g., dated! ::)). There’s a bit on time-slicing a path search algorithm in Buckland’s “Programming Game AI by Example” but the book is C++ mostly, and again, pretty dated (2002–but it has a nice physics/vector math refresher among other plusses).

Allowing culling to be a frame or more behind (because of thread starvation) seems like a bad idea. If I start turning the camera really fast, I could spin to an area that was culled off the screen the last time the frustum culling was computed. Then nothing would show up for a frame or two and then things would pop into being in the middle of the screen.

Of course it may not be so obvious, it just pop in at the edges of the screen, or you’re never actually starving the CPU enough that frustum culling gets far enough behind. It just seems like a recipe for headaches later on.

Overall I think threading is a great thing to try for a game. It can be difficult because of the need for a ‘frame’. Just carefully consider the different pieces so you satisfied with your games behavior in the event that one gets starved for some period of time.

IMO, though, a better approach is to use threads to maximize the number of cores used but still require that everything (or all game logic tasks) complete for a frame. You can then have much lower priority tasks, such as texture IO running in the background across multiple frames. If you allow threads to be starved and not have every piece of logic execute at regular times, it seems likely that your games behavior will be non-deterministic and highly dependent on the quality of the player’s machine.

[quote=“philfrei,post:4,topic:36789”]

int processors = Runtime.getRuntime().availableProcessors();

Don’t you just love Java for these kind of functions? xD Note that this gives you the number of LOGICAL processors; e.g. my i7 760 gives me 8 logical processors, (4 physical cores) * 2 because of Hyperthreading. That doesn’t really change anything though, as I see no reason NOT to utilize Hyperthreading. xD

I’ve been looking at the standard Executors and ThreadPools, but I believe I need some functionality that is not very well supported in these classes. For example, I’d like all threads to do different parts of the same task. These classes wouldn’t be optimal, and I don’t think it’ll be very hard to write my own thread pool.
Starting new threads every frame would be a very bad idea, as the VM gives no guarantee on WHEN the thread is actually started, only that it will be in the future. In my earlier example, both the worker threads and the culling threads would be either be wait()-ing or locked by a CyclicBarrier when they are not doing anything. They would only be started once.

If you look at my first post I had this in my example code:

level.processCullingResults(); //Wait for culling thread(s) to finish, if they are not done yet. Then apply the result to the level.

The culling results can be either completely correct or one frame old, but never more than a single frame old. Also, I dare you to notice the difference. For RTS-games the problem should be blatantly obvious. If you click on the minimap and move the camera to a whole different section of the map, the old culling results will be completely invalid. Have you ever noticed the screen being completely black for a single frame? I don’t think anyone can notice these problems at 60 FPS, though at a really low frame rate it could be a noticeable but then the real problem would be the low frame rate… Speeding up the frame rate in all cases at the cost of quality in worst case scenarios only seems like a good idea to me.
I believe using one frame old results are standard for both frustum culling and occlusion testing, especially for occlusion testing as it requires two-way communication between the CPU and the GPU. Rendering the occlusion geometry and then immediately asking for the results would starve the CPU while it waits for the GPU to do the actual z-test, and then starve the GPU due to an empty command queue. Waiting a frame before getting the result solves both problems.

Yes, I seem to have forgotten an important part of the game: the game loop.

As I need my game to be deterministic to keep the server and all clients in sync to minimize network traffic I can’t use a dynamic delta-time game loop. (I’m practically using the same idea as Warcraft 3 and its infamous 250 ms sync time. xd) I need a fixed frame rate for updating objects to ensure that all clients gets the same result as the server. Currently I am thinking of targeting 30 UPDATES per second. Please refer to the last technique in this EXCELLENT article:
http://www.koonsolo.com/news/dewitters-gameloop/
This means that some things will run at 30 FPS (e.g. game state updating), while rendering can run at any speed. I think I’ll need to make a list of what has to be done for each update and for each frame to find the best way the utilize threads… I’ll be back!

Again, thanks for the replys!!! ;D

Double posting but what the hell.
I’ve continued my threading experiments, and made a threaded particle system test. It’s a pretty simple idea; let each thread animate its own set of particles, but draw them all using OpenGL single threaded. Each thread also loads the vertex data (position and color) of all the vertices into a Buffer, and each set of particles is drawn using glDrawArrays (which means: number of draw calls = number of threads). My best results was about 3.5x scaling on 4 CPUs with hyperthreading (8 threads) with particle updating ONLY (no graphics). The scaling is worse with graphics as the drawing is single threaded and sometimes GPU-limited. HOWEVER:

  1. The scaling gets better with higher load. More calculations per particle or just more particles is the way to go. With only gravity I can only get the above 3.5x scaling with 100 000+ particles.
  2. The scaling drops when I actually draw the particles. It’s still around 2-2.5x scaling, but the processors not used to draw the particles could be used for something else in a real game.
  3. I’m getting very inconsistent performance. After hours of frustration I realized the problem, or actually problems: Turbo boost and processor “parking”. xD Turbo boost SOMETIMES gave a 10-15% boost in single threaded programs. Processor parking (as it says in the Windows resource manager) means that one of the hyperthreaded logical cores in each physical core is disabled temporarily. This is on 99% of the time, but all are utilized when 8 threads actually are competing for CPU allocation. Or at least it should be but that wasn’t the case. I could see a 30% performance fluctuation with 8 threads and 4 threads being way slower than 3 threads (!), due to logical processors being disabled. I could increase the likelihood for all cores to be enabled by using 16 threads instead of 8, but it didn’t guarantee anything. >___<