Render/Update Threads

Hi everyone,
im wondering if anyone ever coded a game using two separate threads for updating and rendering, in order to make sure that the updating happens at fixed intervals no matter how slow the rendering (that way players in a multiplayer game stay in sync).
Ive been using wait/notify mechanism with java, but i got very poor performance, so any suggestion/comment/idea would be highly appreciated.
Thx ;)!

Perhaps you could just skip rendering some frames in a single threaded update/render loop when it would cause too much of a delay.

This of course wouldn’t look too nice as skipping frames might be noticeable by the user sometimes, but it’s an idea.

Seb

Do NOT use two separate threads updating and rendering the same game state. You can not render when state is updateing and vica versa. If you update at fixed intervals as you suggest you will get synchroinzation bugs. This is if they share data. The rendering thread can get a copy of the game state before rendering. Then you only have to syncronize the cloning of the state. I hope this is what your doing.

In a multiplayer game it might be a good idee having a separate thread for the server logic. Let the local player get it’s state from the server the same way as the other clients, threw a socket.

Have to know more about what your doing to find you why your getting poor performance. It’s probably a design isue.

[quote]Do NOT use two separate threads updating and rendering the same game state. You can not render when state is updateing and vica versa. If you update at fixed intervals as you suggest you will get synchroinzation bugs. This is if they share data. The rendering thread can get a copy of the game state before rendering. Then you only have to syncronize the cloning of the state. I hope this is what your doing.
[/quote]
Umm…I have to disagree with this. This is the best way to do things if you can manage it properly.

Rendering should be just that, rendering. A clean seperation of game and renderer is the best approach to performance.

Though you are right it does mean you need to do some synchronization, but its not difficult or error proned as long as you have the seperation between renderer and game state as a goal of your engine.

Most people think thread synchronization means wasted cycles…and it does…but with multi-processors (CPU/GPU) you end up with very little becuase you gain more performance from keeping them both busy then you loose when the odd time one has to wait for the other occurs.

If done correctly you will have your engine running at a constant (or at least close) speed and your renderer running also as fast as it can.

The general rule of performance in games is pretty simple keep both the GPU and the CPU busy at all times and this can only be done by seperating game state from renderer.

Basically: While the GPU is rendering the last frame, your other thread is updating the game objects/state in preperation for the next frame. This approach does use more memory of course, because you have doubled your geometry for non-static entities (static geometry and attirbutes should be shared between engine and renderer for things like preconditioned collision detection…a collision calc on a wall can be done while the wall geometry is being read by the renderer)

Networking a game is also MUCH easier if the renderer and the state are in seperate threads.

You end up with two FPS counts:
Rendering FPS and Engine FPS.

Renderer FPS fluctuates with pipe-line saturation
Engine FPS fluctuates very little (depending on complexity or things like poorly implementing the networking side)

Actually we have to disagree with you here Vorax, and your reasoning is actually based on a misunderstanding of how the GPU works.

It is absolutely, strongly advised to do rendering and “ticking” in one single thread, mainly because synchronization cannot be accurately or reliably scheduled and it’s hard to do.

Secondly, there is already a separate rendering thread going on, except it exists in the GPU and AGP bus, not on the CPU, and Java doesn’t know anything about it.

Your ticking thread may be thinking about a bunch of stuff but the reality is that while it’s thinking, the GPU is sucking data down the AGP bus and drawing it in the background. When you’re done thinking, swap buffers, and start rendering immediately. Rendering should consist solely of blatting data down to OpenGL (or whatever).

This way you are pretty much guaranteed to get maximal performance, minimal complexity, and great results.

Cas :slight_smile:

What type of updating and what type of rendering thread do you mean? If you mean a game engine interaction with a graphic engine, then game engine, or better to say world engine should be asymetrical thus update of rendered data should be assymetrical and lazy as well.

Of course every one knows how graphic card works. Half of the image data is pushed into the memory, then transfered by AGP to the GFX. GFX would store them in it’s fast memory (hopefully) and uses them for various transformations. In the mean time second GFX card uses second half of the image data. Then rasterizer comes and everything is encoded into some TV like channel that is decoded on the monitor into the binary data and showed in a 18 bit colors. If you doesn’t have the right resolution, or right monitor you are screwed.
Letting rasterizer work faster than 75 FPS is pointless, forcing input polling 200 UPS isn’t.
And if you think this is horrible, then just wait what would do your graphic engine to reflections.

My current GFX doesn’t know much about threads, and I hope NVIDIA will not be stupid and rather increase amount of pipelines, and FP consistency. It’s however pretty much usable as a secondary processor.

A little warning I listened that at least some ATI cards are pretty bad in sending data into main memory. Writting to AGP was 3x faster than reading from AGP. Nasty.

[quote] Writting to AGP was 3x faster than reading from AGP. Nasty.
[/quote]
Isn’t this generally the case with all cards, on purpose? AGP is designed to send data one way only, pretty much. glReadPixels sucks for this reason. PCI-X, on the other hand…

[quote]Actually we have to disagree with you here Vorax, and your reasoning is actually based on a misunderstanding of how the GPU works.
[/quote]
No, I don’t think I misunderstand, I just think I understand threading very well. :wink:

For things like collision detection, particle engines, AI, in memory texture generation outside of OGL, on-demand loading and complex geometric deformation, etc, you can get much improved peformance if you thread the effort and seperate it from the rendering loop.

But for further evidence:

Multi-threading in Games (Game Developers Conference 2004)
"Perimeter

Perimeter is a real-time strategy game being developed by the Russian studio KD Labs. RTS games are necessarily CPU-heavy. Perimeter implements constant terrain reforming as a gameplay element, and also incorporates highly detailed graphics for an RTS game. The dev team broke up the main loop into two threads, one for the user interface and graphics rendering, the other for logic and AI.

The net result was a substantial increase in performance. In a demo of the game with multithreading disabled, the game exhibited a frame rate hovering in the 12-18 fps range, while the multithreaded version ran at 22-30fps on debug code on a 3.2GHz Pentium 4.

See for more info:

http://www.extremetech.com/article2/0,1558,1554199,00.asp

[quote]It is absolutely, strongly advised to do rendering and “ticking” in one single thread, mainly because synchronization cannot be accurately or reliably scheduled and it’s hard to do.
[/quote]
Synchronization can be very easily predicted if you can profile your code well enough. Generally, if you have complex geometry, the GL calls and buffer swap will be your codes biggest lag. If you can do all the other stuff in another thread before that completes (without starving the GL calls for more time then your refresh rate…usually not hard to do if your engine implemented a good PVS or octree, frustum culling, occulsion, etc…if not, the user would notice the FPS drop anyways), then synchronization is just for cases where your CPU got lagged, which would have slowed down your main loop anyways. Even if you can’t you might have got a good chunk of it done in the other thread.

But what about the overhead of the actual render calls? What about the buffer swap itself? All lengthy operations involving alot of bus waste. What about the waste of your engine itself?..

John Sweeney Raven Games (Technical lead of SOF2)
“Obviously waiting for Vsync before window swapping can cause a slow down. If you take 1.1 frames to draw a scene, then wait for Vsync before swapping frame buffers that means that .9 of that frame is spent doing nothing on the card. The OpenGL context can accept commands and buffer them up, but it’s not going to be doing any rendering until the buffers are swapped and the back buffer is unlocked for rendering again. You can see why this would slow the game down.”

You can disable Vsync, but now you are doing overdrawing when you can render faster then the monitor will display it.

A better solution is don’t waste the GPU’s time or the CPU’s time. With threads you can balance your processing for both parts. Lets say the refresh rate is 85Mhz, the optimal solution is to have your rendering loop doing no more then that when issuing to the front buffer. Once you have achieved that, you have spare cycles for the engine to use for things like AI…etc. You can let buffer wait’s handle that for you, but not accurately unless you can predetermine how long the next frame will take to actually be rendered. With a second thread you don’t need to concern yourself with vsync on or off. The goal being to have your engine function at it’s own rate.

Behaviors in Java 3D are a good example of this effect and usefulness. You can start off using nothing but Frame notification to do things like prediciton/collision, but as the game gets more complex you will see a drop in your FPS. Behaviors can elivate this to a degree becuase they are a seperate thread and can allow you some independence of the graphics rendering entirely. Switch from frame notification to a timed notification and you will get better overall CPU/GPU utilization. (though I don’t condone the exceisive synchronization in J3D, a little to generic, but it is effective for many uses where the CPU is getting a work out along with the GPU).

The final advantage is what I think the original poster was going for. Your game operates consistently despite the capabilities of the bus and GPU (lets also not forget a big hurdle for java is also the JNI overhead from the JOGL calls). Varying system capabilities are big issues for games like Doom 3 and HL2 where they are tyring to support a very broad range of hardware capabilities, but keep the game play fair for all. In a single threaded engine you are very limited in what you can do. For example you can adjust the player movment based on lag calculations inccured by network or FPS…standard stuff…but that is actually about as much as most people do because the complexity increases greatly when dealing with other parts of the game: what about lost time for AI? What about particle accuracy? What about physics accuracy? Those things tend to be very difficult to base off a lag timer. Its much better to keep the game engine at a constant frame rate (I mean frame rate like Carmack means engine frame rates, not graphics frame rates) and do all work based on that thread. Let the graphics/network lag come as it will, but keep the player in the same world as everyone else. That way graphics hardware and network speed become much less of an issue (though network threading to a much smaller degree…whole other can of worms…best solved with threads :wink: ).

Of course, I may not know what I am talking about (my knowledge is greatly based off of others)…but it is working for me.

In the game I am working on, I track two FPS rates, engine and renderer. Renderer varies from 52 FPS at worst (12 live entities (1200-1500 tri each), 24 active particle systems with around 200 quad particles per system), 12 animation trackers (run-time generated key frame splines), collision detection (AI is really just random at this stage)…up to 410 FPS when staring at nothing.

With 410 fps at nothing, but still calculating everything in the engine (this will be greatly optimized with entity activity bounds later), that means the wait incurred by the calculations/synchronizations is costing me around 1-2ms (measured), while hard geometry rendering costs around 16ms…so I have a surplus 3-4ms for additional calculation before the target worst case of 50FPS rendering is reached (unoptimized), which means i can use all that power for the unimplemented AI. Once I attack the geometry with a stripifer, I should be able to almost double the current renderer FPS rate (all triangles/quads atm), which will mean I get a final net of 90-100fps or so…or an additional 8-10ms for per frame processing. Though, I will be adding more geometry to the game, so I am only anticipating around another 4-5ms beyond my current.

The engineFPS ranges from 55-60 no matter what is going on, to maintain this I must not exceed my anticipated excess of 4-5ms). The computer this is tested on is a modest 1.8 Ghz machine, with a Geoforce 4MX (better to test on crap as they say).

I have been working on this game for about two months now and spent 3 months researching the design of the engine before starting it…I guess you’ll have to wait for the results before you can determine if I am smoking something or not :wink:

PS: Anyone know of a good java stripifier?

Java3D is known for being slower than other solutions. Again from the knowledge of others I’ve been informed this has alot to do with their use of synchronisation. Hence people like Xith didn’t bother and got way better performance.

This being an increadibly trivial engine test I don’t think its really relevant.

Like you I can’t talk for industry experience but I have tried lots of different ways of writing games in Java over the last 3 years. I’ve certainly come down on the side that one thread is better (at least where logic/rendering are concerned - networking is still up for debate in my mind).

Kev

I talk from experience now :slight_smile:

Don’t use threads! And I can say no more on the subject because it has been said many times before.

Except for networking! Where they are absolutely 100% ideal.

Cas :slight_smile:

Re: Networking , I’ve been back and forward on the issue. I think theres a difference between accepting network traffic and applying it to the data model. A seperate thread for accepting and storing the network updates makes sense. Actually applying the changes seems to be nicer in the main thread (i.e. as part of game logic).

Just the feeling for me so far…

Kev

That’s not “evidence”. That merely says their coders are poor or alternatively a lot about how self-fellating their promotional speakers are. GDC talks aren’t selected for quality and accuracy so much as for “fame” and “contentiousness” and “interest” of what is going to be said.

Not really single versus multi-threading, this is instead a discussion of “how to workaround bugs in the API design where you are only provided with blocking calls”.

It is a base assumption of writing a single-threaded game that all your calls which can be are asynchronous and predictable. If you have that, there’s no advantage to using threads. In theory, you ought to have that available. In practice, lots of things annoyingly aren’t feature-rich enough.

e.g. from a developer perspective I have no need to block on waiting for a vsync, I should instead be able to check how long I’ve got until the next vsync happens. Much much more useful.

[quote]Re: Networking , I’ve been back and forward on the issue. I think theres a difference between accepting network traffic and applying it to the data model. A seperate thread for accepting and storing the network updates makes sense. Actually applying the changes seems to be nicer in the main thread (i.e. as part of game logic).

Kev
[/quote]

  • when you get to “check network stuff” part of main loop, you just query your non-blocking channels
  • …process it…
  • re-loop

What’s the problem? Why bother with mutliple threads? Your game-loop is running at least 60 times a second, so … the additional latency of the delay until the networking is processed is noticeably less than the ping time to the server.

Ah sorry, I was thinking serverside when I mentioned threads & networking together in the same sentence.

Cas :slight_smile:

Again this is only is past experience… but network data quite often arrives in big batches - then very little - big batches - then very little. Thats made me hiccup rendering updates in the past. Not so much processing the updates (cause this has to be done anyway) but rather just reading the data in. Running the data read and process in another thread alieviated this…

It may have been because of other factors in the couple of cases I’ve tried but since we’re relying on entirely anecdotel evidence here anyway… what the hey. :slight_smile:

Kev

[quote]Java3D is known for being slower than other solutions. Again from the knowledge of others I’ve been informed this has alot to do with their use of synchronisation. Hence people like Xith didn’t bother and got way better performance.
[/quote]
I agree it is slower…alot slower, and I agree alot of it is to do with synchronization, but it is also to do with the amount of layers you pass through before actually doing work that is valuable for your game. J3D seems focused on making the programmers life easier at the expense of performance.

I developed my own scene graph as part of this engine and I do comparisions (I have a duplicate of the capabilities in Java 3D…which was the first attempt at this 7 months ago). In my worst case scenerio, the Java 3D runs at 21 FPS, in my engine it runs at 54 FPS. The Java 3D has only rendering logic and animatoins though. The scene is identical and the graph is almost identical (my SG can have transforms assigned directly to geometry objects. like J3D I do all matrix calculations…but a bit differently and I only support float for performance (no doubles)

I have also implemented a branch level matrix stack that exactly duplicates what Open GL does internally with the modelview stack (push, pops, mul’s, translate, rotate, dot’s), this avoids alot of Jogl calls and keeps all transformation data aviable at all times. By basically writing parts of Open GL (not the interface, the specification) as Java you can improve performance alot. I have a grand total of one push, one pop and one matrix multiply in terms of JOGL calls beyond vetex pumping.

If this is a trivial test of the engine, I am not sure what you would consider a real test :expressionless: I have seen alot of Java games and none (except the Agent 9 stuff) seem to be pushing things even close to as hard as I am so far. Maybe I just haven’t seen as much as I think I have.

Well, I do have a little game industry experience (not in Java though). I have done some work for Activision and I am in the credits of at least one off the shelf AAA game from them (Star Trek: Elite Force II)…not lead developer though :wink: I know enough people at UBI Soft, Activision and Raven, that I am sure I could get a job in the industry, but my real job pays alot more then gamming industry would and has alot better hours :wink: So the only way for me to go is start a company and either buy an engine or build my own… building is cheap and can be thrown out if necessary.

True, these things need to be taken with a grain of salt (or a bag full some times :wink: )

[quote]
Not really single versus multi-threading, this is instead a discussion of “how to workaround bugs in the API design where you are only provided with blocking calls”.[quote]

I agree but it illustrates where a thread can help you. The .9 wasted in that discussion affects both the GPU and the CPU becuase the GPU is idle and the CPU will be writing to the front buffer instead of doing something useful.

I don’t think I am explaining what my engine is doing very well.

One thread is doing everything except the actual GL calls (the engine thread). It does this at a target rate of 62FPS. The other is doing nothing except GL calls (the render thread).

When the renderer is ready for more geometry, it uses the reference to primitivate data last sent to it by the engine. All logic and calculations have been predone on the geometry. This reference passing is the ONLY place that synchronization could actually occur. This is memory expensive though becuase the engine maintains a copy of the original geometry, transformed geometry and a the last transformed geometry that is actually being rendered. Only for non-static geometry though. Static geometry is shared by both the engine and renderer threads.

If the engine thread finishes before the renderer thread (which means renderer FPS has dropped below 62 FPS), the game state is updated. This means that the engine thread will continue determining the next frame of game state, but the renderer may not actually render it.

The renderer is not blocked though. It can render the previous set of data if the engine has moved on to yet another frame. In the worst case, the engine will be rendering frame 1 when the engine is actually creating frame 3. As geometry lessens, the renderer will get back into synch and be rendering one frame behind the engine.

If the renderer finishes before the engine (which means renderer FPS has exceeded engine FPS), then the renderer will simply reproduce the last frame sent to it by the engine. If the renderer is operating 2 or more frames above its target FPS, it can also yield (yes my engine has yield code INSIDE the main loop :wink: ) to give the engine some more power. If the engine doesn’t need the power, it gives it back with it’s own yield and thus FPS can rise well above the target fps.

The result is a consistant game state regardless of the amount of geometry rendered (as long as the engine FPS rate does not drop below 62…I can’t imagine the complexity or bad code it would take to cause this).

The rendering is an extremley tight loop that does NO calculations, isn’t blocked by the game state and serves the sole purpose of pushing primitive level GL calls and swapping the buffer as fast as it can.

So, you mean you failed to do your offline RT scheduling adequately for the particular problem :). i.e. you didn’t put a limiter in of how much network data you would process per frame, and ended up processing “as much as happened to be there”. It’s no big leap to put in a limiter, e.g. “I will only process a max of 3 network messages per frame, and the rest I shall delay until next frame”. Compared to network latency, shunting messages back a couple of frames isn’t usually a problem.

So long as you bear in mind that a single threaded game-loop is an RT scheduler (i.e. you ahve to work out in advance how long every method call will take and personally allocate how much time its’ going to be given at runtime, and scheudle those slots so that you meet your requirements) you don’t generally have problems using it.

The fact that doing this is boring and any modern OS should have a decent enough thread scheduler is why I tried to do java games multi-threaded, until the AWT/swing stuff bit me on the ass. After that I just never bothered going to multiple threads :). Except where e.g. I already had a module with it’s own selector-plus-private-thread and re-used it as-is…

Yeah, you’re probably right. Makes sense when you put it like that…

Kev

[quote]Isn’t this generally the case with all cards, on purpose? AGP is designed to send data one way only, pretty much. glReadPixels sucks for this reason. PCI-X, on the other hand…
[/quote]
Actually It might be PCIExpres, I remember comparisson of two cards from NVIDIA and ATI on OpenGL board, and that 3x speed difference towards CPU was just on ATI card.

This theory would fail when met with windoze. I still doesn’t know why multithreaded programs works better with older versions, than with XP.
This however doesn’t mean you shouldn’t use less than 3 threads. I once said a game could use more than 10 threads.
2 - 4 for AI
1 for graphic engine
1 to 3 for game engine
1 for sound engine
1 to 2 for world engine
(1) possible for network
(1) for lazy decompression of textures and background saving.
and sheduler
Sometimes it would be the best idea to call Thread.yieldToThread(Thread t);
Of course some of them could be deamon threads, but my point is AI needs at least one own thread for thinking in free CPU time or it would be dumb.
Another advantage of this is when all is done it could nicely sleep and let CPU cool a little, a very nice feature for lastest processors.