Concurency in 3d engine.

From an engineering standpoint, it would be a nice exercise, a rather complex one but still nice.
There are papers out there, on the internet, that talk about various methods and what can be done. Google will help You.

However, if Your goal is to produce something workable as soon as possible, then separating render from update will introduce You to a world of hurt and pain. If You just want to make an engine for others or an engine for Yourself to make a game, then You rather should focus on those goals. You can make Your engine better if and when Your performance or other metrics need it.

I used a similar strategy in one of my experiments. For every loop, there will be a thread for the render and a thread for the logic, both running concurrently. Once logic is complete, you buffer all the needed data as “read only” which the render thread will then access, this way the logic thread will move onto the next loop while the render thread grabs the current frame data and renders it. If incase one is running ahead of time, ie, a render thread is still rendering previous frame data as this frame is complete, the logic thread will stall and wait for the render thread to complete. Using this method means that a total time taken for a single frame is the same time as the thread which takes longer to compute.

Put the logic in its own thread, not the rendering. OpenGL can only be called from a single thread, so if you put rendering in its own thread you’ll also have to initialize everything that uses OpenGL in that thread as well or mess with OpenGL contexts. Logic has no such limitation, so just put the logic in its own thread. It’s makes almost no difference in how the game works in the end, but it solves so many problems with loading.

To be able to render while updating for the next frame, you’ll need to double buffer your data to avoid situations where an update is done while rendering is running causing only some objects to be moved. At best this can cause graphical glitches, at worst it can easily crash your code (remove an object from a list in the logic thread, rendering thread goes out of bounds) and those errors are extremely hard to debug since they happen randomly and infrequently. To solve this, you need to have some kind of synchronization point where you copy all the data needed to render a frame so the rendering thread can safely use this data to render the game without having to worry about the logic thread modifying its data.

I wouldn’t say only using two threads is optimal though. The load will most likely be very unbalanced between the two threads, so it won’t actually be able to use two cores to their full potential. A more advanced approach would be to divide the engine’s internal subroutines into different independent tasks that can be run in parallel. For example particle updating, bone animation updating, physics updating and terrain rendering can all be run at the exact same time on different cores since they do not need any synchronization between each other. Even further, you can even do particle updating on multiple threads assuming particles can not interact with each other. Such a system could use any number of cores assuming there are enough independent tasks, and since some tasks can be split up between any number of cores (100 000 particles can be updated on 1-16 cores or whatever) the engine can scale very well with the number of cores in the system.

As a shameless plug, I’d like to mention that I’ve developed a very small library for accomplishing this. It allows you to construct a tree of tasks with dependencies inbetween them. The task tree can then be run using a multithreaded executor which identifies which tasks can be run in parallel and attempts to spread out the available tasks across all available threads. You can find it here: If you have any questions about it, feel free to contact me either here or on Skype (same username).

more cores = more power. It’s better to have ‘n’ cores at speed ‘x’ than one at ‘nX’. And it’s even better if the cores have (an equivalent to) HyperThreading.

Just the bare minimum brush strokes which will be very inexact. Say you have two machines with identical CPU/memory specs except one has 8 2 GHz cores and the other has one core @ 16GHz neither with HyperThreading to keep it simple. The 8 core machine can at every cpu cycle be processing 8 threads while the single can process 1. Each core has it’s own set of registers and L1 cache (among other things). When a core stalls…nothing effectively happens until the stall is resolved (or the OS swaps threads). The cost of a core stalling for some external to core communication will be 8x greater on a single core machine. When a stall occurs on the multi-core, on average the other cores will continue to run. The single core machine has to service all threads of all running processes including the OS. The context switch is effectively 8x higher and no forward progress computation occurs during the switch. The multicore machine, by it’s nature will require fewer context switches (on average).

Not to mention the insane heat issues of a 16GHz CPU. Heat = X * (clock speed)^2, plus increasing the clock speed requires a higher voltage to maintain stability, meaning that the scaling is probably closer to (clock speed)^3. 8x clock speed = probably over 100x as much heat generated (and power consumed of course).

I’d think that in practice I’d probably prefer a theoretical 16GHz CPU than a 2GHz 8-core CPU considering how horribly badly threaded most game engines are. It’s true that without a similar memory overclock (and CAS reduction etc) the clockspeed itself wouldn’t help much, but the only engine I’ve seen that currently runs well on for example AMD’s newer 8-core CPUs is the Frostbite engine (= Battlefield games since Bad Company). Those games get a huge boost even from hyperthreading on a dual core, and the engine seems to be able to use any number of cores you throw at it. As CPUs will be getting more and more cores more engines need to implement proper threading!

Off topic Planetside 2 bash
[spoiler]On the other end of the spectra we have Planetside 2 which was, and still is, a CPU performance clusterf*ck. A few months ago certain places in the game (huge bases) used to bring down my FPS to sub-15. When there were huge fights at those places, it’d go down to under 10 FPS. Oh, and this was on the continent without trees since vegetation easily cut your FPS in half and everything set to minimum. I now have a better CPU (i7 4770K @ 3.5GHz) and they’ve gone a long way at making the game run better, so I can usually maintain 50+ FPS now (still on minimum settings), but it still doesn’t use more than 2 cores. Hopefully the promised upcoming optimization patches will change that.[/spoiler]

I have a beast of a CPU and GPU, but the whirlpool room in Borderlands 2 becomes a single-digit-FPS affair once the firefight starts to heat up. In fact when the game crashes, it’s usually there. The only thing that ever slows down to a similar degree is the end-game loot shower from the Warrior. I suspect memory management is the culprit in both cases, not CPU efficiency.

How about a game state buffer? The logic threads read current game state (which is invarying) and write to a future game state. The rendering threads read from current game state and do their opengl stuff. When the logic threads have finished updating the future game state it is swapped to become the current game state.

Really ? I dont have a beast and I didn’t notice a thing. Granted I never have a game on high settings D=

I’m not sure if this is what you’re looking for, but it stumbled across this article a couple months ago, it doesn’t seem too difficult to implement:

actually you might reconsider your opinion on copying renderstates, because not having a frame completely updated befor drawing results in annoying glitches. your argument that it is only milliseconds does not hold, because the glitch is at least timeboxed to the frame time. especially in more demanding scenes youll get the 3d equivalent to screen tearing, but extended to all kinds of attributes - from color to lighting to textures to geometry. and since it can happen that youll have completely different glitches in subsequent frames youll end up with a very poor visual quality. you need at least something like double buffering for your data. to reduce data copying you could journal your changes to one buffer and apply the same changes to the second buffer before applying the changes for the next frame.

And locks et al. hate you and will make you hate yourself. My 2-cents is to avoid them like the plague.

And how do you handle for example rotation matrices? It’s not all about simple translations. If the render thread uses a matrix that the update thread is working on, the result will be an invalid rotation matrix and depending on the change in the matrix, this can look really strange.

Locks are a PITA and usually the worst solution. Lock-free, wait-free, etc…search.

I see such things and are really annoyed by them. Other than that it is only difficult to spot if it happens once in a while, but as soon as the update thread and the render thread each take more than 50 percent of the frame time, its probable that it happens every frame!

also if you use locks to solve latency problems, youll effectively synchronize your engine into single threaded performance - also youll hate to debug it. but as you said, time will tell :stuck_out_tongue:

Rendering using undefined matrices will be horrible. What if matrix will scale object scale to really huge and you suddenly get extreme overdraw. Then gpu time goes up to roof and everything is pita.

So ive been hearing about all these issues such as the 1 core cpu etc. I have a nice idea , you include it in your game but have it off by default when someone turns it on they get a warning popup and then it switches to using two threads.

Or you fix your code instead of putting down a big “Use at your own risk” sign…