Networking a Dungeon Crawl

Jeff · August 17, 2005, 7:30pm

Yeah FWIU I think I would call AOE’s solution “turn based” rather then lock-step myself.

Your scheme si kidn of a hybrid. Its actually closest in some ways to lock-step plus latency buffering, but your sort of future-buffering rather then input buffering.

I need to thin ka bit about what the advantages/disadvantages are.

One qustion I have is, what do you do if someoen gets a really bad latency spike and misses getting a scheduled move unti lafter that time? Do you stall their game? If so, what “time” are future mvoes made at when they hit the sevrer?

Seems like getting thsi right coudl get kidna complex…

Alan_W · August 17, 2005, 7:38pm

Isn’t it impossible to receive commands from other clients that are too far in the past or future?

I thought that a client couldn’t move past the next clock tick until it had received the command set from every other client for that clock tick. Clients cannot therefore fall behind or get ahead of other clients.

i.e.
Tick = 0
Client sends command set for interval 0->1 to every other player
Client waits until it has received command set from every other player
Client updates game state
Tick = 1

On top of that you build an interpolator for intermediate states. If this is fully deterministic all clients will do the same thing.

The drawback I believe is that the player inputs can’t be processed until you have a full intervals worth, which is sent to the server. Surely this means that the game still lags by one lockstep interval. i.e. I don’t think it solves the problem which caused Kev to consider lock-step in the first place. I think it does solve the issue of how to synchronise AI players without using a huge amount of network bandwidth.

I think there’s some articles about lockstep in one of the GEMs series. #3 I think (not checked)

Caution: : Alan is pontificating on something he knows nothing about.

Orangy_Tang · August 17, 2005, 7:44pm

You’d have to. But then as they’re no longer sending current commands everyone else eventually reaches the end of their buffer and ends up waiting themselves for the spiked player to catch up.

It messes with your head, but lockstep is surprisingly robust with just a few simple rules (assuming you can keep it deterministic). ;D

kevglass · August 17, 2005, 8:29pm

What Alan described is what I understood to be pure lockstep, you’ve basically got to have confirmation that everyone has reached a particular frame before continuing. I don’t really want to do this.

If one player is lagging behind I’d like it lag their commands getting distributed but not the game. If you’ve got a terrible ping then your command response will be terrible (up to a point when you actually run too slowly and start recieving commands for the past and hence have lagged out).

As to huge latency spikes this is what I meant by this:

If a players connection get so bad that they’re game progresses past where commands they haven’t recieved yet are meant to be scheduled they’re completely scuppered. Either the whole game has to be stalled while this player gets synched up or they get disconnected and thrown away

I’m not hugey keen on the not having full lock step the more I think about it, but I’m hoping I can deal with it as it goes.

Kev

Orangy_Tang · August 17, 2005, 8:43pm

Now you’re getting tricky.

Typically you’d just lag everyone’s commands by a suitable amount to cover everyone’s worst case latency. Now if you assume everyone’s got a good connection then 200ms ping is about the worst you’ll get. That’s only a tenth of a second before commands get executed, which is probably unnoticable (more so if you do something to hide it, like verbal acknoledgement of orders straight away). But one bad connection does turn into command lag for everyone.

I suppose you could make each person work on individual ‘turns’ of different lengths. People with good connections get nice small turns and those with bad get delayed more. The problem is that the person with a bad connection still needs to receive and ack the commands from the good connection people so i don’t think that’s going to be too effective.

I’d suggest trying with a single command lag for everyone and seeing how it behaves. I suspect it’ll be more responsive than you think as long as you stay below about 8 players (how many players were you aiming for?)

Alan_W · August 17, 2005, 9:10pm

I’m still trying to get my head round this. Are you suggesting:

The lockstep interval is small (say 20mS)
keyboard commands are scheduled for a future step about 10 to 100mS in the future depending on the worst client latency.
keyboard commands are sent on key down/up rather than every frame.
clients assume that all commands for a given step will have reached them before that step is due to be processed. This is necessary to remove the requirement for each client to transmit state every lockstep interval, which would clog up the network.

Problem - If a client lags so badly (lag spike) such that his/her transmitted commands are not received by one or more of the other players before the aaplicable time step is processed then the game must reset to the last commonly agreed state. There must be a protocol to achieve this tricky resynchronisation.

To make this work all players must have forced lag. Once you have this and send key up/down events as they arise, rather than at 5Hz or 10Hz intervals as in the non-lockstep technique, then the overshoot problem should largely go away (except during a lag spike) even without lockstep. Of course without lockstep, the clients tend to drift, so a periodic game state must be sent in addition to key states, to ensure everything stays in step. Thus in my view, the main benefit of lockstep is still reduced network bandwidth when there are a lot of non-player objects flying about. This may well be a key benefit if you are intending to use the Game Garden API without swamping their server with position updates.

Alan

kevglass · August 17, 2005, 9:23pm

The spec is for 4 players currently.

No keyboard controls - all point and click. The commands would be scheduled at minimum 100 ms in the future (and thats after its reached the server). Clients assume that all commands reach them before they need to be actioned, should they recieve a command in the past then thats the interesting part:

a) Kill the client - they’ve got a laggy connection so lets get rid of them
b) Attempt to resync the client to the others by pausing everyone (tricky business)
c) Possibly rewind the game state (undo on each command passed around) to get back to the correct state then replay.
d) Maybe one copy of the game state could be maintained on the client which is updated when a recieved command is action. If they ever get ahead of the game and recieve an old command you simply jump back to the stored state, replay the game up until the point you want at top speed, then apply the new command.

Kev

Orangy_Tang · August 17, 2005, 10:32pm

It’s really not that tricky - in fact it happens near-automatically. As I said earlier you just make sure you only advance when you’ve got everyones input. This may mean one person stalls waiting for a ‘complete set’. The other peers are still sending this by default, so it eventually gets fixed up. Meanwhile the other players can advance only as far as they themselves have a complete set of commands. As one person is delayed this automatically makes everyone else wait for the simulations to match up again and commands are again arriving and leaving in sync.

Bonus points for freezing the logic but keeping the graphics/audio still running so small stalls are less noticable and ugly.

[quote]c) Possibly rewind the game state (undo on each command passed around) to get back to the correct state then replay.
d) Maybe one copy of the game state could be maintained on the client which is updated when a recieved command is action. If they ever get ahead of the game and recieve an old command you simply jump back to the stored state, replay the game up until the point you want at top speed, then apply the new command.
[/quote]
Trailing State Syncronisation. Bleeding edge stuff, but basically an extrapolation of what you’ve just described. Snags are of course that you effectively double your CPU and memory use of your simulation (which may be acceptable depending on how big and complex it is), and you can have nasty time warps.

Jeff · August 17, 2005, 10:55pm

Now this is getting pretty close to the latency bufering technique I know. if so, its actually much simpler to implement.

Look at it this way:

FIRST imagine an unbuffered lock-step game. The logic is:
(1) Send my input
(2) WAIT for evreyoen elses input
(3) Calculate, render and repeat

The only thing latency buffering adds is a queue for all the input. Eahc slot in the queue is 1 complete set of everyone’s input. The buffer conceptuially starts with a full set of null entries so you dont start the game unti lthe first non-null entry has hit the head of the queue. Input and queue writing is doen ona seperate thread from queue reading so yo uare never sitting waiting for inptu unelss the queue fully empties.

Jeff · August 17, 2005, 11:02pm

btw.

DukeNukem3D on TEN used about 200ms latency buffering at 7 packets per second communications and played pretty darn well

In general, its actually preferrable to have a pedictable latency pause, even a longer one, then unpredictable shorter ones. This has been proved in psych tests. HUmans quickly elanr to factor in predictable latencies but unpredictable oens drive them nuts.

kevglass · August 18, 2005, 7:27am

At the moment the point for me is I don’t want clients waiting.

I got a little further last night. I’m not using conventional turns so this might get a bit confusing and might just bit a load of old rubbish but I’m going to try it anyway.

All the clients connect to the server
One client chooses to “start the game” - for now after this point other clients can’t join in. It seems there are ways round this but for now I don’t care about it.
When the game start messages reaches the server it remembers the real time that it started a game. Which I’ll call “Server Start Time”. It forwards the message onto the clients.
When the client’s recieve the start game message they record the time and call this “Client Start Time”.

---- Now the game is running —

The client sends a command “I’d like to move player to position X”. This sends off a command object to the server.
The server recieves the command, schedules it for 200ms in the future based on its “Server Start Time” and the current system time. The scheduling time is placed in the command and the command sent out to the clients.
The clients recieve the message and add it to their queue.

CLIENT LOOP:

The clients are sat rendering the game. I get the delta for the frame in milliseconds (accurate system timing required here, LWJGL Sys.getTicks()) and pass that into the game state progressor. This takes the delta and splits it into steps of 10 milliseconds (remembering the remainder for next frame). The game state is progressed each “turn” which lasts 10 milliseconds - this moves actors around in the game world.

Just before starting a 10 millisecond “turn” the queue is evaluated for commands scheduled in the time timeframe. Any found are actioned and removed from the queue.

Should the game time start getting close to time of the scheduled commands being recieved I’ll lag the clients loop a bit to conpensate and adjust the clients “start time”.

Any obvious leaps of faith?

Kev

Alan_W · August 18, 2005, 8:36am

It would work if the probability of a late command packet was really low.

However what if you assemble a packet for transmission (including the timestamp) and then a context switch occurs (another thread or even another process). Maybe a garbage collection occurs. The packet then goes (late) with the original timestamp. Anyway, if that delay reaches 200ms, then there’s going to be difficulties. It makes the code kinda fragile.

The basic problem is that timestamping a packet and sending it is not an atomic operation. Similar with reading the packet. This all adds into unpredictable network latancy. It really mucks with my ping measurements, which is why I’m averaging them.

SharpShooter Arena throws away time-expired packets and prints a message on the java console. You get some on start-up when the latency measuring code hasn’t stabilised and also if you change from full-screen to windowed (or vice versa) due to the long time this takes (It also messes with my time sync code which makes it worse - I might modify the sync code to ignore ludicrously long pings to reduce that effect). Other then that, they don’t seem to happen (unless one player has massive lag spike problems).

Alan

Orangy_Tang · August 18, 2005, 8:51am

I don’t see how that’d prevent having to make a client wait. For deterministic lockstep everyone has to use the exact same inputs at exactly the same time. Otherwise you get divergence. Enforced delays can make the gameplay smoother but worst case your buffer will empty and you’ll have to wait.

[quote]Should the game time start getting close to time of the scheduled commands being recieved I’ll lag the clients loop a bit to conpensate and adjust the clients “start time”.
[/quote]
I don’t see how you’ll do this without breaking your determinism? If you adjust the lag/buffer you need to do that on all machines at the same time (which would be possible).

kevglass · August 18, 2005, 9:21am

It’ll work, honest Ok… maybe I don’t get this so much then

Re: Timestamping - but the timestamping isn’t to do with real time but rather simulation time. So if it does get a bit delayed, well thats fine, the latency buffer should account for that and if the simluation at the client end starts getting a bit close to the time of the commands their recieving (because the client machine has gotten a little bit further ahead than it should of due to maybe server processing lag whatever) I’ll slow the running of the simulation down a bit so it goes back to where its meant to be.

My understanding isn’t that the commands must be actioned at the same real time, but rather they must always be actioned at the same point in the simulation - to keep the whole thing deterministic.

The simulation will still run the same on all machines, it’ll just run a bit behind on some?

And I agree, this algo could be fragile.

Kev

Jeff · August 18, 2005, 9:35pm

Thats the issue Im worried about. When a latency buffered lock-step game falls behind by longer then the pre-dtermiend amx latency, it pauses.

It seems to me that if any of your clients ever fall behind to the point where they are generating user events at a time later then another player’s current simulation you have to either roll that other player abck in time or abort the entire simulation.

See?

I thin kyour just thinkign about the output and not considering input thats happening in those “past” time frames. The only other thing you can do is to always condier input “current time” but then those players whose simualtions are displayign the past are in the position of having to guess the future on their screens and react to it before they see it.

Control lag, particularly a predictable and constant lag, can easily be adjusted for. But control into the future? The human animal isnt all that reliable at predicting future events… if we were we’d all be playing the stock market for a living

kevglass · August 19, 2005, 7:45am

No, thats not the case, I’ve had time to think about it finally!

What lock-step seems to have this caveat - “if one player lags, they all lag - or at least pause”.

What the system I describe above give’s me - “if one player lags, it takes longer for his/her commands to get actioned”.

This is because when a command is sent from a player its scheduled simulation time is set by the server. So, even if the client has managed to keep very close to the server simulation progression, if it takes a long time for the command to reach the server the command will be scheduled in the future. If the client is lagging behind simulation time then the command will be percieved as taking a long to action - even though other clients might be further along in the deterministic simulation and have already seen it happened.

The only protection I need to provide is that clients don’t manage to catch up to server simulation time and hence recieve commands scheduled for times in the past. I don’t see this as possible since clients will always occur >0 lag on the “start game” message and hence we’ll always be running the simulation behind server simulation time (assuming millisecond accrurate timing on client and server).

However, to protect again simulations running into the future I’m considering only allowing simulations to proceed while there is at least one command scheduled in the future and have the server sending out keep alive commands.

I’m going to try to find time at lunch to draw a diagram of how I think this works,

Kev

PS. Noticed my forum tone of recent times has become absolute, this isn’t intentional, as always I’m hoping someone can find the flaws.

Jeff · August 20, 2005, 5:51am

I think your missing something critical here… but myabe Im the one who uis missing something.

Lets take a gadanken experiment.

We have two points in time:

Server time
CLient 1 time.

The server is actually the “real” time base in the sense that any command arriving is considered to have “just occured” in server tiume and be time stamped, right?

Now lets imagine Client 1 falls far behind the server. 10 seconds for argument. What do I see as the user of client 1 at this point in time? I see a tiem in 10 seco nds “the past” as far as the server is concerned. Seeing that situatikon I react and send a command.

That command goes to the server. The server sees it however as a command occuring at ITS time. So while the user was reacting to something ten seconds ago, their reaction is effetively delayed 10 seconds in how it effects the world situation. Its sent back to the cleitn timestamped 10 seconds ahead of the client.

For 10 seconds I sit there banging the key while the state progtrersses and my input is apparently ignored. FINALLY 10 seconds later my input “arrives” and effects the game. By now ive probably over corrected banging at the keyboard…

All sounds very frustrating to me.

Jeff · August 20, 2005, 5:53am

I shoudl mention that its worse then a 10 secodn delay. A fixed lag is annoying but can be accounted for by the user. In your case though the lag is both variable and unbounded. This WILL drive a user nuts even ignoring the other issues.

kevglass · August 20, 2005, 9:08am

Agreed, and I’ve just experienced it (by pausing my local game simulation and letting it get behind intentionally). However, remember that my objective is make only the laggy player experience lag. So, for me, thats exactly what should happen, a player is so laggy that they get well behind simulation time and hence get a laggy response. However, what I hadn’t really thought about is how lag spikes get compounded in this system.

If the player gets one lag spike they get a bit behind time. If they get another they get further behind time. Hence that 10 seconds is getting closer and closer What I hadn’t thought about was the compound nature. So, yes there is a problem.

Time for a coping strategy.

For another reason I’ve added a keep alive message. Every so often (lagBuffer / 2) the server sends out a keep alive command. The clients arn’t allowed to progress unless they’ve command in their local scheduled list to work towards. The keep alive keeps something in their lists. This prevents clients running off into the future (and hence screwing up the whole distributed simulation)
Here comes the science… Or something. I haven’t implemented this yet (some point today) so I don’t know if it works. A client is sat there recieving its commands (one at miminum every lagBuffer / 2 (*)) and it knows what the lag buffer is. Lets say the buffer is set at 200ms. It knows that when it recieves a command it should be 200ms in the future of its current simulation time. So, now it knows how far infront/behind the simulation its running (well plus the transmission time from server to client).

So let says, its detected its got 100ms behind where it should be, i.e. its recieved a command thats scheduled 300ms in the future (instead of the 200 expected).

Each frame render my client goes to the game world and tells it to update by how much time has passed. Say 25ms has passed since last frame renderered. My game world currently moves forward in incremenets of 10ms, so normally I divide the amount of time passed by 10ms and work out the number of cycles to update through the data. The dividing by 10ms and working it out in terms on a set number of cycles is what allows me to keep the simulations synchronised.

So, normally this would mean I need to run 2 simulation turns and remember that I’ve got 5ms of time outstanding for next update. However, this time I know I’m an additional 100ms behind so naively I might run an extra 10 turns this update. This naive approach could of course mean that the rendering of the clients world would suddenly speed up, 10 additonal moves. I’m going to implement it this way initially, I spect I’ll need to temper this down a bit to smooth things out. I think the neat thing about this is going to be that if my player gets to far ahead in the simulation I can detect this, miss a few turns next update and the sim comes back in line. So hopefully I’ll be tending towards have a consistent lag at all times while coping with lag spikes.

Moreover, if I was add a player mid game - I could send them all the commands that have occured in the game (maybe cache them at the server or something). They’d get a command scheduled for a long time in the future. The above algorithm would detect this and on the next game update would progress their simulation all the way through to the point other players are at. Since its all deterministic, it should place them in the current game state.

Apologies for the long post, hope it makes enough sense to be able to be considered.

Kev

PS. On advice from Elias I’ve added checksums for the game world to ensure they’re staying consistent. It turns out “CHECKSUMS ARE YOUR FRIENDS” !

(*) note that this messages is very very small (6 bytes) so the impact isn’t going to be to scary)

Jeff · August 21, 2005, 4:25am

Sounds like your on your way to an interesting solution

I had wondered if yould have additive problems, but wasnt sure…