back to listing index

### Gaffer on Games | Deterministic Lockstep

[web search]
Original source (gafferongames.com)
Tags: game-programming
Clipped on: 2016-02-28

# Deterministic Lockstep

Hi, I’m Glenn Fiedler and welcome to Networked Physics, my article series on how to network a physics simulation.

In the previous article, we discussed the properties of the physics simulation we’re going to network. In this article we’ll network that physics simulation using the deterministic lockstep technique.

Deterministic lockstep is a method of synchronizing a system from one computer to another by sending only the inputs that control that simulation, rather than networking the state of the objects in the simulation itself. The idea is that given initial state S(n) we run the simulation using input I(n) to get S(n+1). We then take S(n+1) and input I(n+1) to get S(n+2), repeating this process for n+3, n+4 and so on. It’s sort of like a mathematical induction where we step forward with only the input and the previous simulation state – keeping the state perfectly in sync without ever actually sending it.

The main benefit of this network model is that the bandwidth required to transmit the input is independent of the number of objects in the simulation. You can network a physics simulation of one million objects with exactly the same amount of bandwidth as a simulation with just one. It’s easy to see that with the state of physics objects typically consisting of a position, orientation, linear and angular velocity (52 bytes uncompressed, assuming a quaternion for orientation and vec3 for everything else) that this can be an attractive option when you have a large number of physics objects.

To network your physics simulation using deterministic lockstep you first need to ensure that your simulation is deterministic. Determinism in this context has very little to do with free will. It simply means that given the same initial condition and the same set of inputs your simulation gives exactly the same result. And I do mean exactly the same result. Not near enough within floating point tolerance. Exact down to the bit-level. So exact you could take a checksum of your entire physics state at the end of each frame and it would be identical.

Your browser does not support the video tag.

Above you can see a simulation that is almost deterministic but not quite. The simulation on the left is controlled by the player. The simulation on the right has exactly the same inputs applied with a two second delay starting from the same initial condition. Both simulations step forward with the same delta time (a necessary precondition to ensure exactly the same result) and apply the same inputs before each frame. Notice how after the smallest divergence the simulation gets further and further out of sync. This simulation is non-deterministic.

What’s going on above is that the physics engine I’m using (ODE) uses a random number generator inside its solver to randomize the order of constraint processing to improve stability. It’s open source. Take a look and see! Unfortunately this breaks determinism because the simulation on the left processes constraints in a different order to the simulation on the right, leading to slightly different results.

Luckily all that is required to make ODE deterministic on the same machine, with the same complied binary and on the same OS (is that enough qualifications?) is to set its internal random seed to the current frame number before running the simulation via dSetRandomSeed. Once this is done ODE gives exactly the same result and the left and right simulations stay in sync.

Your browser does not support the video tag.

And now a word of warning. Even though the ODE simulation above is deterministic on the same machine, that does not necessarily mean it would also be deterministic across different compilers, a different OS or different machine architectures (eg. PowerPC vs. Intel). In fact, it’s probably not even deterministic between your debug and release build due to floating point optimizations. Floating point determinism is a complicated subject and there is no silver bullet. For more information please refer to this article.

You may wonder what the input in our example simulation is and how we should network it. Well, our example physics simulation is driven by keyboard input: arrow keys apply forces to make the player cube move, holding space lifts the cube up and blows other cubes around, and holding ‘z’ enables katamari mode.

But how can we network these inputs? Must we send the entire state of the keyboard? Do we send events when these keys are pressed and released? No. It’s not necessary to send the entire keyboard state, only the state of the keys that affect the simulation. What about key press and release events then? No. This is also not a good strategy. We need to ensure that exactly the same input is applied on the right side when simulating frame n, so we can’t just send ‘key pressed’, and ‘key released events’ across using TCP, as they may arrive earlier or later than frame n causing the simulations to diverge.

What we do instead is represent the input with a struct and at the beginning of each simulation frame on the left side, sample this struct from the keyboard and stash it away in a sliding window so we can access the input later on indexed by frame number.

```struct Input
{
bool left;
bool right;
bool up;
bool down;
bool space;
bool z;
};```

Now we send that input from the left simulation to the right simulation in a way such that that simulation on the right side knows that the input belongs to frame n. For example, if you were sending across using TCP you could simply send the inputs and nothing else, and the order of the inputs implies n. On the other side you could read the packets coming in, and process the inputs and apply them to the simulation. I don’t recommend this approach but lets start here and I’ll show you how it can be made better.

So lets say you’re using TCP, you’ve disabled Nagle’s algorithm and you’re sending inputs from the left to the right simulation once per-frame (60 times per-second).

Here it gets a little complicated. It’s not enough to just take whatever inputs arrive over the network and then run the simulation on inputs as they arrive because the result would be very jittery. You can’t just send data across the network at a certain rate and expect it to arrive nicely spaced out at at exactly the same rate on the other side (eg. 1/60th of a second apart). The internet doesn’t work like that. It makes no such guarantee.

If you want this you have to implement something called a playout delay buffer. Unfortunately, the subject of playout delay buffers is a patent minefield. I would not advise searching for “playout delay buffer” or “adaptive playout delay” while at work. But in short, what you want to do is buffer packets for a short amount of time so they appear to be arriving at a steady rate even though in reality they arrive somewhat jittered.

What you’re doing here is similar to what Netflix does when you stream a video. You pause a little bit initially so you have a buffer in case some packets arrive late and then once the delay has elapsed video frames are presented spaced the correct time apart. Of course if your buffer isn’t large enough then the video playback will be hitchy. With deterministic lockstep your simulation will behave exactly the same way. I recommend 100-250ms playout delay. In the examples below I use 100ms because I want to minimize latency added for responsiveness.

My playout delay buffer implementation is really simple. You add inputs to it indexed by frame, and when the very first input is received, it stores the current local time on the receiver machine and from that point on delivers all packets assuming that frame 0 starts at that time + 100ms. You’ll likely need to something more complex for a real world situation, perhaps something that handles clock drift, and detecting when the simulation should slightly speed up or slow down to maintain a nice amount of buffering safety (being “adaptive”) while minimizing overall latency, but this is reasonably complicated and probably worth an article in itself, and as mentioned a bit of a patent minefield so I’ll leave this up to you.

In average conditions the playout delay buffer provides a steady stream of inputs for frame n, n+1, n+2 and so on, nicely spaced 1/60th of a second apart with no drama. In the worst case the time arrives for frame n and the input hasn’t arrived yet it returns null and the simulation is forced to wait. If packets get bunched up and delivered late, it’s possibly to have multiple inputs ready to dequeue per-frame. In this case I limit to 4 simulated frames per-render frame to give the simulation a chance to catch up. If you set it much higher you may induce further hitching as you take longer than 1/60th of a second to run those frames (this can create an unfortunate feedback effect). In general, it’s important to make sure that you are not CPU bound while using deterministic lockstep technique otherwise you’ll have trouble running extra simulation frames to catch up.

Using this playout buffer strategy and sending inputs across TCP we can easily ensure that all inputs arrive reliably and in-order. This is what TCP is designed to do after all. In fact, it’s a common thing out there on the Internet for pundits to say stuff like:

I’m here to tell you this kind of thinking is dead wrong.

Your browser does not support the video tag.

Above you can see the simulation networked using deterministic lockstep over TCP at 100ms latency and 1% packet loss. If you look closely on the right side you can see infrequent hitching every few seconds. I apologize if you have hitching on both sides that means your computer is struggling to play the video. Maybe download it and watch offline if that is the case. Anyway, what is happening here is that when a packet is lost, TCP has to wait RTT*2 before resending it (actually it can be much worse, but I’m being generous…). The hitches happen because with deterministic lockstep the right simulation can’t simulate frame n without input n, so it has to pause to wait for input n to be resent!

That’s not all. It gets significantly worse as the amount of latency and packet loss increases. Here is the same simulation networked using deterministic lockstep over TCP at 250ms latency and 5% packet loss:

Your browser does not support the video tag.

Now I will concede that if you have no packet loss and/or a very small amount of latency then you very well may get acceptable results with TCP. But please be aware that if you use TCP to send time critical data it degrades terribly as packet loss and latency increases.

Can we do better? Can we beat TCP at its own game. Reliable-ordered delivery?

The answer is an emphatic YES. But only if we change the rules of the game.

Here’s the trick. We need to ensure that all inputs arrive reliably and in order. But if we just send inputs in UDP packets, some of those packets will be lost. What if, instead of detecting packet loss after the fact and resending lost packets, we just redundantly send all inputs we have stored until we know for sure that the other side has received them?

Inputs are very small (6 bits). Lets say we’re sending 60 inputs per-second (60fps simulation) and round trip time we know is going the be somewhere in 30-250ms range. Lets say just for fun that it could be up to 2 seconds worst case and at this point we’ll time out the connection (screw that guy). This means that on average we only need to include between 2-15 frames of input and worst case we’ll need 120 inputs. Worst case is 120*6 = 720 bits. That’s only 90 bytes of input! That’s totally reasonable.

We can do even better. It’s not common for inputs to change every frame. What if when we send our packet instead we start with the sequence number of the most recent input, and the 6 bits of the first (oldest) input, and the number of un-acked inputs. Then as we iterate across these inputs to write them to the packet we can write a single bit (1) if the next input is different to the previous, and (0) if the input is the same. So if the input is different from the previous frame we write 7 bits (rare). If the input is identical we write just one (common). Where inputs change infrequently this is a big win and in the worst case this really isn’t that bad. 120 bits of extra data sent. Just 15 bytes overhead worst case.

Of course another packet is required from the right simulation to the left so the left side knows which inputs have been received. Each frame the right simulation reads input packets from the network before adding them to the playout delay buffer and keeps track of the most recent input it has received by frame number, or if you want to get fancy, a sequence number of only 16 bits that handles wrapping. Then after all input packets are processed, if any input was received that frame the right simulation replies back to the left simulation telling it the most recent input sequence number it has received, e.g. an “ack” or acknowledgment.

When the left simulation receives this ack it takes its sliding window of inputs and discards any inputs older than the acked sequence number. There is no need to send these inputs to the right simulation anymore because we know it has already received them. This way we typically have only a small number of inputs in flight proportional to the round trip time between the two simulations.

We have beaten TCP by changing the rules of the game. Instead of “implementing 95% of TCP on top of UDP” we have implemented something quite different and better suited to our requirements: time critical data. We developed a custom protocol that redundantly sends all un-acked inputs that can handle large amounts of latency and packet loss without degrading the quality of the synchronization or hitching.

So exactly how much better is this approach than sending the data over TCP?

Take a look.

Your browser does not support the video tag.

The video above is deterministic lockstep synchronized over UDP using this technique with 2 seconds of latency and 25% packet loss. In fact, if I just increase the playout delay buffer from 100ms to 250ms I can get the code running smoothly at 50% packet loss. Imagine how awful TCP would look in these conditions!

So in conclusion: even where TCP should have the most advantage, in the only networking model I’ll present to you in this article series that relies on reliable-ordered data, we can easily whip its ass with a simple custom protocol sent over UDP.

Up next: Snapshots and Interpolation

If you enjoyed this article please consider making a small donation. Donations encourage me to write more articles!

1. Some pretty inspiring stuff. Thanks for taking the time to share it.

2. Hello Mr. Fiedler.

Supposing you are stuck to Websockets in order to build some MMORPG on the Browser. What network model or techniques would you pick? Could you talk a little bit about this?

• Glenn Fiedler (Author)
.

I haven’t implemented this myself, but websockets are TCP right? in which case you must design a protocol based around events, with the ability to soak up 1-2 seconds of latency, basically you should try to do exactly what WoW does.

3. Thanks a lot for your blog, it’s really a great source of information and inspiration to me! Also your talk, I watched the other day at gdcvault.com/play/1022195/Physics-for-Game-Programmers-Networking

However, one thing you said (and I’ve read elsewhere also), that using the lockstep technique shouldn’t be used with more than 4 players because everyone has to wait for the inputs of the other players, makes me think.

So this is only true in a peer-to-peer setup I think. For our game Asteroid Fight (www.asteroidfight.com) we also use the lockstep technique with a server-client architecture and I don’t wait for the inputs of every player. The server runs at constant speed and when it gets the input from a player it just queues it for the next frame.

This means that players with big network delays won’t slow down the game experience for everyone else. This way you can handle a lot more than 4 players and it works quite well. We already tested it with 40 players. (AI players all running their own process + 3 human players on different machines)

• Glenn Fiedler (Author)
.

But what happens to the players who deliver their input late?

• Then their actions are queued for the next frame update and no input will be processed from this player at the current frame. The player’s actions will essentially be skipped, but taken into account at the next update.

It could mean that this player will experience a little more lag, but with the advantage that the game is not dependent on every player’s connection.

Maybe some background information is of interest:
I run the simulation at 60 Hz and the server sends out frame update messages every 3 frames (every 50 ms) that contain the information about the actions that have been taken.

All the server does is receive messages from clients, distribute these messages to the other clients (they don’t immediately process these messages, but queue them) and send out frame update messages, which tell the clients which messages they have to process at frame X.

I hope it was understandable, and I’m not stating the obvious

4. Yixin Xie
.

Hello Glenn,
After implemented the lockstep mechanism for my project, I’ve been trying this optimization technique to reduce occasional lag. My project is an action game with 2-4 players giving inputs to their units. All player inputs are sent to the server and relayed to other clients.
My lockstep relay server(for relaying input packets among the clients) has a frame count to keep track of how many frames of commands the server has broadcast. at a particular frame, the server would try to find if there’s any missing input packet for any player. It would make up input packets for them with a constructed ‘noop’ packet. This way, the player with low latency can play without much lag. At the same time the lagging players might go dark for several frames, after they come back, when their input packets are received by the server, the server would discard their input packets because ‘noop’ packets were already sent and executed by the faster clients.
I think Starcraft might have used this strategy. My question is, how does the server know when to do ‘noop’ padding for other players

• Glenn Fiedler (Author)
.

I think the general idea is that the server sets some time limit on inputs arriving and if the inputs arrive late from that player the server replaces them with nops. Perhaps some threshold for lateness that is considered maximum (maybe 100ms) before relaying that frame.

In short, I think you have to add a bit of extra delay before relaying inputs for frame n, but not too much.

cheers

5. Confusing typo – the link titled “Your game doesn’t need TCP (yet)” should say “UDP” I believe.

Anyway, thanks for these excellent articles. I don’t work on games (or anything with real-time networking requirements) but it’s fascinating to learn about these techniques all the same.

• Glenn Fiedler (Author)
.

Fixed. Thanks!

6. Cleroth
.

One should note that using your UDP way this generates a lot more work for the server because it needs to process a lot more packets (size doesn’t really matter so much).
Generally TCP is able to handle many more clients than UDP.

• Glenn Fiedler (Author)
.

Citation required.

7. Thank you for the article. You wrote

“This is also not a good strategy. We need to ensure that exactly the same input is applied on the right side when simulating frame n, so we can’t just send ‘key pressed’, and ‘key released events’ across using TCP, as they may arrive earlier or later than frame n causing the simulations to diverge.”

I don’t get that part. If the ‘key pressed’ and ‘key released’ is part of the lockstep (so the player will have some latency on his own ‘pressed’ and ‘released’ events, why is it a bad idea to additionally send those events? I’m asking this because I’m dealing with a game that also does not do this, but for me it’s quite limiting to not have the keystate.

• Glenn Fiedler (Author)
.

Because not only is it important to send the key presses, it’s important to ensure they are played out on the same frame. If you want to send keypresses feel free, the point of this section was just sending the inputs and applying them at slightly different times is not a deterministic strategy.

8. totox
.

“Floating point determinism is a complicated subject and there is no silver bullet.”

I guess using double would lead to the same problem, but did you made some practical tests ?

why not using integers ?

• Tommassino
.

Well, the problem is operations on floats or doubles don´t necessarily give the same result on different hardware. This happens for example because of rounding modes, operation order settings and transcendental functions (sin,cos,…) differing on different machines. I dont understand the question about integers, you simply need floats for physics simulations.

• Glenn Fiedler (Author)
.
9. I forgot to ask, what software do you use to emulate network conditions? (esp. packet loss and latency)

• Glenn Fiedler (Author)
.

I just write my own packet queue and simulate latency and jitter by adjusting time to deliver per-packet. Packet loss is just a random % roll whether to deliver each packet. It’s not hard.

10. Very helpful article! I just have one question. In the example, we have a sort of one-way communication right? The right side is a delayed replay of the left, and inputs nothing. If this is used for a game, then both sides need to give input. Then in order for the inputs to happen on the same step in the simulation, they would have to tag inputs to happen at an agreed step in the future. That would mean the current lag from left to right side would apply from input on one side, to execution on the same side. Wouldn’t that be terrible? Do we just have to deal with it, and only use lockstep for slow games?

• Glenn Fiedler (Author)
.

Yes. We are approaching first the one way synchronization because otherwise issues like latency compensation and client/server vs. peer-to-peer topology must be discussed. These subjects are complicated and tend to derail/dominate over discussions of how to perform the sync so they will be discussed later

• Glenn Fiedler (Author)
.

This is why you use client side prediction.

11. Hi Glenn, thank you for great articles on game networking!

In 2012 I read your articles and decided to implement networking model, similar to Deterministic Lockstep.

I’d like to share my experience in this kind of network model which is used in our Free to Play First Person Shooter “Survarium”.

On the low level we implemented application level protocol on top of UDP.

This protocol handles unreliable, reliable, unique, reliable + unique, reliable + unique + ordered messages. This protocol may gurantee reliability only in some period of time and several virtual channels inside single UDP “connection”. User of this protocol is enabled to merge several messages into a composite message. Protocol may join several messages into a single packet. Protocol also handles multipacket messages, i.e. messages larger than the maximum packet size. After all a packet may contain several single messages, composite messages and parts of multipacket messages (actually, tails of such messages).

On the game level we must guarantee determinism, which is very difficult because of logic errors and different compiler optimizations for the same C++ code with floating point computations for client and server. There is also one more caveat: not only values must be the same, but their order must be the same as well. We had a bug when weapon modifiers on client and server were the same, but because of the different order, their sum didn’t match.

Our game works in the following way: client generates inputs and sends them to server with time, on which this input must be applied. When client receives inputs of other players, it deserializes game world to the time, on which incoming input must be applied, applies input and processes the game to the time we have to render.

Client also sends crc of the game world state on the time, which becomes a history, i.e. cannot be changed, and server verifies crc and sends back full game state (up to 8 KB in our case) when crc doesn’t match.

We use partial reliability for inputs and send crc unreliably. When server sends players inputs to client, it merges several inputs on the same time into a composite message.

Here are some numbers: our input is 8 bytes long: 2 halves for mouse move and 32 bits mask for game actions. I have measured that if we apply static Huffman coding, we may exceed 3 bytes inputs on average.

Packets acknowledgement works as Glenn described in his article, with one change: acknowledgement bits count is variable.

We have fixed almost all logic errors. It is possible with help of client and server journals. Also in case of crc mismatch server may send to client information about the order and values in the mismatched game state. Client then may print only mismatched parts of the game world state. This is very useful for us, we use this every time we get new “resync”.

To fight against compiler optimizations we try not to use floating point computations or add “volatile” when this is not possible. Probably, the best way would be to have the same .dll for client and server.

Our server never deserializes, it moves only forward, which is why it is very efficient: we may handle more than 128 matches with 16 players each on a single physical server and this is not a limit, I plan to rise this bar up to 256 matches.

More good news: our players don’t depend on pings of other players, only on their ping deviations to some extent, which is controlled by server.

Bad news: server should constantly verify not only ping to clients, but speed of their timers as well. We have players with systems, on which timers give several seconds more than server’s timer in a single minute!

There is also a problem with visual effects: with all these deserializations a single event may be raised several times.

There is a problem with cheaters, since every client knows about the whole world. But the only problems are ESP, Wallhack and Aimbots. No other cheats are possible. We also plan to enlarge our maps and players count and give the clients information about nearby game objects only on a per-object basis.

All in all we have tiny traffic, both upload and download, we duplicate clients inputs’ to fight against packet loss, game is working good on the slowest 3G connections, people with ping 300ms may play with comfort and don’t affect other players.

Now we’re adding physics carefully (we use Bullet) and it still works deterministic.

12. Hi! I have a solid programming background with desktop and web software. But to be honest, I’m completely new to syncing game data in real time over a network with latency. For me, this site is an amazing resource, it’s a great help. Thank you for your effort!

I already fully understand: “use UDP not TCP” and so on. Unfortunately I have a hard case here (must use TCP, or UDP via WebRTC):

I’m trying to develope a 2D space shooter game in browser. Currently the concept is single player and can be found here: http://richrd.github.io/ . My question is, is this even worth it? Should I just forget the browser and create a stand alone game? My main concerns is the network issues and client performance.

My last and most important question is: when is then next article comming? I’ll donate as much as I can for your awesome content!

• Glenn Fiedler (Author)
.

If you would like to support me, you can do that here: http://patreon.com/gafferongames

• I know this is really old, but it’s worth noting that there are some successful browser-based real-time multiplayer games like agar.io and tagpro. They are forced to use TCP and yet still manage to be tons of fun and very playable.

13. Great article! Would you say that lockstep is easier to implement than a server-client setup (with interpolated snapshots)? If you’ve a month to implement the networked portion game, which would you pick?

• Glenn Fiedler (Author)
.

Impossible to say. If your game is deterministic its easy. Otherwise it’s really hard, or even impossible

14. You mentioned sending snapshots for a non deterministic simulation, I am assuming it would be using client side prediction then correcting it with the snapshot. My question is this, will you be covering the process of efficiently comparing two snapshots from any point in time to create a delta snapshot, i.e. only containing the full value of the variables that changed, but not any variables that didn’t change, even if they are on the same physics object? I am thinking of a quake 3 style network model, where the delta world state since the last acknowledged state is sent every N ms and keeps the clients in sync, but comparing every single value on every single object to find out which ones changed just seems far too wasteful, and marking something as changed only works for generating one snapshot, but not for comparing any two snapshots from any two points in time.

• Glenn Fiedler (Author)
.

Yes, this is in covered in the “Snapshot Bandwidth” article

15. Pingback: TCP problems | Yosoygames

16. Interesting technique.. Couple questions:

What if inputs cannot be compressed to so few bits, e.g. a FPS often allows a player to move in any direction you can point a controller’s analogue stick (which is 16 bytes of information)?
From my understanding the screen on the right of your example has to play behind the left x amount of time so as to not “stall” when packets do drop and it waits for the next one that does not drop including the input that dropped. However now what if the right was controlling something too and that had to be shown on the left? The right is effectively running at wall clock time + x whereas the left is running at wall clock time. Now if the right sends the left input too that input was applied at (wall clock time + x) so latency is effectively increased by x and the technique can no longer be applied because you would have to delay the left now buy this amount of time etc.. ?

• I just read your first paragraph “Deterministic lockstep is a method of synchronizing a system from one computer to another by sending only the inputs that control that simulation.”, so that answers my question!

Looking forward to how you do it in a FPS

• Glenn Fiedler (Author)
.

You would never use this technique to network an FPS. Typically FPS use snapshots and interpolation technique with client side prediction and latency compensation. cheers

17. Lloyd Weehuizen
.

I’ve got lockstep working with unity.

You’ve got to use the FixedUpdate for all calculations and I’m not using their Physics or Animation systems.

18. The bar room protocol works fine as long as your aggregate bandwidth is a very small fraction of all bandwidth. Additionally, it is helped because the data you send is very easily RLE compressed.

If Netflix changed to that model for video, though, the internet would be at risk!

I don’t understand why the different network viewpoints can just agree that both of these statements are true, and are not in conflict?

Also, you should tip the hat to the classic “1500 archers on a 28k modem” article from GDMag oh so many years ago!

• Glenn Fiedler (Author)
.

Yes. This technique works because the aggregate inputs encode well relative to each other (similar efficiencies exist for sound codecs) and the worst case bandwidth required is a very fraction of typical available bandwidth. Video frames are too large for this technique to make sense there.

19. There is something that needs to be considered, which I stumbled on while developing Distant Souls (still in development, halted for now).

I like the UDP idea of repeating the data instead of trying to write a reliability layer… however:

Your math only took into account one player.
It is true that the client must only upload to server its own inputs.
However a client must receive the input from all players. A 12 player match can quickly add up to your bandwidth, as the server may need to send up to 15 frames of 12 players to each of those 12 players.

That’s 720 bits * 12 players * 12 clients = 12.6 kb
Sure that’s worse case scenario (if we track per-player acks, for some clients we’ll be re-sending up to 1 frame, while other clients we will be resending all 15 frames).
12.6 kb is still within reasonable thresholds; without accounting compression (and indeed, frame coherence is very high).
But, without compression, it is pushing the limits. As those 12.6kb are not per second.
Furthermore your 6 bits per player estimation may be a bit conservative (depends on the game’s needs).

But take in mind this approach has a quadratic growth. It doesn’t scale with the number of players.
A 64 player server (which should be usually a breeze for a lockstep game in terms of bandwidth) needs 360 kb (720 bits * 64 * 64) of upload bandwidth for the server.

However we would have to see how well it compresses using gzip, lzma, etc. As the stream as a whole for all players should be extremely compressible.
Also a second client is capable of becoming a relay server in order to offload the main server (i.e. you take care of the first 32 players including myself, I take care of re-sending the data to the other 31); but that certainly adds complexity.

A P2P is also possible (but hard, because all clients must be in sync to advance frame, whereas a server can keep ticking the frame and insert the inputs as they arrive), but a server-client architecture makes things much, much easier.

• Glenn Fiedler (Author)
.

Yeah separate to the UDP vs. TCP discussion, I don’t actually recommend deterministic lockstep as a viable networking model.

In this article series I’m first presenting the three main techniques available:

1. Deterministic lockstep,
2. Snapshots and Interpolation,
3. Stateful Synchronization (running sim on both sides and sending state.)

Each technique has their own strengths and weakness when you look just at the synchronization from A -> B, and each have their own pros and cons when you scale to a game with n players, especially the waiting for the most lagged input issue with deterministic lockstep, also CPU cost, bandwidth, difficulty of obtaining determinism, tradeoffs between client/server and peer-to-peer and so on.

There is a also strong trend towards physics to be done on the GPU these days. I don’t believe that physics implemented on the GPU would be deterministic between say different video cards by different manufacturers although somebody with more knowledge could clear this up if I’m wrong.

For this reason alone I think deterministic lockstep is a dead end for networked physics, at least on PC.

For console games however, especially physics heavy ones that only have a single platform may continue to use it because of the ability to synchronize a world with many physics objects by sending just the inputs.

For example, Little Big Planet used deterministic lockstep on the PS3 and did pretty well, but it only ever had to deal with PS3s playing against other PS3s (fixed platform)

cheers

• You’re correct.

GPUs are already implementing standardized IEEE floating point, across devices and drivers the actual microcode is different either due to different architectures or optimizations (i.e. number of registers leads to code optimized differently; special instructions unavailable to other devices, driver updates optimize differently, etc) which deems deterministic GPU floating point across multiple PCs a dream.
Unless the physics ran on the GPU is entirely implemented using integers (who does that? really!?).

However when it comes to CPU physics, this is very much possible as long as you stay away from rcpss and rsqrtss instructions; and some trigonometric functions (i.e. pow).

Deterministic lockstep is always nice to have, it’s by far one of the best debugging tools I’ve seen for analyzing bugs, crashes, or just observing how a particular variable evolves frame-to-frame for better understanding (and that occasional mini heart attack when you realize your build runs differently on every run and need to bisect your repo).

• Glenn Fiedler (Author)
.

Thanks for the info!

20. I just love how the method one might immediately write off as inefficient and/or unhelpful is actually the best method outright.

The redundant sends is akin to an image of blasting someone with a hose until they tell you they’re sufficiently soaked or throwing something at a wall until it sticks(and I guess the wall talks back to you to confirm it did, but anyway…) That kind of gameplan just sounds like the worst thing you can do. I can imagine in some scenarios it most definitely can be, but here it really works out! I love this stuff!

Dare I ask if there are many side effects? Assuming you’re not just sending input data even in seperate packets of information, what if we’re talking a fully working and communicating networked game with all kinds of information being thrown about and multiple players. Does the ‘firehose’ of input nformation make sending other information more difficult? I suppose it will always depend on the client connection anyway, but I’m wondering if even if the packets are small, you’re still pushing out a larger count of packets all together from a single machine.

• Glenn Fiedler (Author)
.

In this case the “firehose” is actually extremely soft and gentle… more like a trickle.

Typical round trip times are 30-250ms and in this window only 2-15 inputs are un-acked and need to be sent per-packet.

Each input is 6 bits, so without delta encoding that means the packet would be:

```16 bits (sequence #)
8 bits (num inputs [0,255])
6 bits per-input
```

That means that the average payload per-packet is 5 – 15 bytes.

It gets even better with the delta encoding…

```16 bits (sequence #)
8 bits (num inputs[0,255])
6 bits first input
+ 1 bit-per unchanged input (best case)
+ 7 bits-per-unchanged input (worst case)
```

Best case with all inputs the same (common): 4 – 6 bytes.

Worst case all inputs different (rare): 5 – 16 bytes.

Notice that in all cases above the payload per-packet is smaller than the UDP header (20 bytes).

That’s hardly shouting. In fact, it’s the most efficient way possible to network this data.

It’s even more efficient bandwidth wise than sending this data over TCP.

(Put that in your congestion pipe and smoke it TCP guys)

• Wow! That information is brilliant. I’m very eager to try this out and compare numbers myself now. Thank you very much for the clarification. That’s pretty wild.

21. The recovery journal mechanism in RFC 4695/6295 uses
this sort of repair mechanism within the compatibility
constraints of RTP/RTCP. This is the standard that is
implemented in Apple’s “RTP MIDI” that ships in OS X
and iOS. See:

http://www.cs.berkeley.edu/~lazzaro/rtpmidi/

• Glenn Fiedler (Author)
.

Super interesting. Thanks John!

.

This is a fine analysis from an individual perspective, but it ignores network effects that the TCP protocol is specifically engineered to address by considering network conditions as simply an environmental fact. If you have 25% packet loss, there’s something seriously wrong with your route. WHAT is wrong with your route? Most likely congestion at some link. It’s overworked. Your solution is to send redundant copies of all data?

Sorry, I’m going to go out on a limb here and say that a protocol that responds to increasing congestion by increasing data transmission is just a bad protocol.

I’ll call your protocol the “bar-room protocol.” You’re in a crowded bar where everyone is talking too loud and you can’t hear what the guy next to you is saying (“WHAAT?”), so you just shout the same thing louder and louder until he gets it. Others respond likewise and pretty soon everyone has burst eardrums and nobody can hear a thing.

• Glenn Fiedler (Author)
.

You sir have clearly never shipped a realtime multiplayer game.

.

Nice ad hominem response. You’re right. However I have lots of experience in real-time embedded systems, low-level networking, and safety-critical systems where the stakes are a little higher. You have nothing else to say about the substance of my comment other than attack my anonymous credentials?

• Glenn Fiedler (Author)
.

Nice appeal to authority but you’re still wrong and provably so. Do the math on the technique described in this article and you’ll notice that with the delta encoding on inputs the average steady state is 6 bytes payload (less than UDP packet header) which leads to an overall average bandwidth of 12kbps, most of which is UDP headers @ 60 pps. I guess if after this response you still think 6 bytes payload is shouting then what can I say?

23. Perfect timing, just started thinking again about deterministic lock step for Satellite Reign, it’s what I used in Syndicate Wars back in 1996 and now I wonder if I can get it to work in Unity3D.

• Glenn Fiedler (Author)
.

I would imagine it is very difficult to obtain determinism within Unity!

24. For more complex cases probably approach like https://en.wikipedia.org/wiki/Linear_network_coding will help. But they are computing intensive compared to traditional one.

25. What I love is how this approach is even simpler than trying to build the reliability mechanism on top of UDP; though you have to have absolute determinism. Great stuff, will definitely keep this in mind for future work.

• Glenn Fiedler (Author)
.

Yes it is in fact the simplest reliability system you can build on top of UDP, but it only works when the data you send is small enough that the redundant sends make sense in terms of bandwidth.

• Thanks for writing this up! We did deterministic lockstep with shared inputs for Red Alert 2, but I’m pretty sure we didn’t think to just send all un-acked inputs in each packet.

• Glenn Fiedler (Author)
.

In some cases sending all un-acked inputs can be too large. That was a while ago so maybe it was a bit too cheeky back then. You can definitely get away with it now though for small messages — cheers!