I have always been looking for an article that explains exactly this. Most expanations of networking are extremely vague and lack concrete examples, but not this one! I have thought about deterministic lockstep for a long time and this article finally sheds new light on some questions that have been lingering in my head for years. Thank you for that.
The point of this article seems to be that high latency and packet loss combined with extremely high available bandwidth is not a scenario that TCP is optimised to handle.
This would be interesting if I could think of a common scenario on the internet where you get that combination of features.
Games and streaming audio/video aren't common scenarios on the internet?
Don't be distracted by latency/packet loss/bandwidth configurations, the main problem here is that TCP was never intended to serve the needs of soft-realtime communication. It's not relevant to deterministic lockstep, but for most realtime problems (I'm thinking of multiplayer FPS, but you can think of Skype if that's not "serious" enough for you), if you miss a packet of data, it might as well have never existed. Better to hear blips and glitches on a phone call over a shitty connection than to start adding seconds of latency.
You missed the point about deterministic lockstep as well, which is intended for scenarios where trying to transmit i.e. the entire state of a virtual army per tick is prohibitive. Designing the program to behave deterministically so that you can send only the player inputs per tick is the exact opposite of an "extremely high available bandwidth" solution.
Audio/video streams are a completely different problem - lost data is just lost. This article is discussing a very specific problem where no data loss can be tolerated.
The solution described only works when very high bandwidth is available, because it aggressively sends duplicate copies of data until acked, using on average 2-15x more bandwidth and 120x more bandwidth in the worst case (numbers taken directly from the article).
If you happened to have that kind of spare bandwidth available then yes, this is an excellent solution. When do you have high latency, packet loss, and bandwidth overprovisioned by a factor of 15?
Deterministic lockstep is primarily used in RTS games, where the only data that is transmitted per tick are tiny commands like "player 1 is constructing a war factory at this location" or "player 1 told unit 6 to move to this tile." If we're ridiculously generous and say that a kilobyte of commands are sent per tick, then 120x bandwidth would be 120 kb/tick. 10 ticks/s is a common RTS tickrate, so that's about a megabyte per second. In practice, far, far less bandwidth would be used than this, given that this architecture scales down to 90's dialup connections (https://www.codeofhonor.com/blog/the-making-of-warcraft-part...), but even a megabyte/s is not a completely ridiculous amount of upload bandwidth for much of the first world (but sadly not the USA).
> Audio/video streams are a completely different problem - lost data is just lost. This article is discussing a very specific problem where no data loss can be tolerated.
I know that, which is why I separated deterministic lockstep from other solutions in my comment. You're missing my point, which is that soft-realtime problems aren't suited for TCP, and cannot be addressed by any single protocol because constraints differ.
You appear to be missing my point, which is that this proposed protocol responds to network congestion by increasing the amount of bandwidth that it uses by a factor of more than 10.
Consider what happens when many clients are using this protocol over a shared transit link and it approaches 80% utilisation, so starts to drop packets at a low rate: all the clients will start batching up larger amounts of history and sending them, which increases the utilisation, which increases the packet loss, which increases the amount of history each client sends... it's a destructive feedback loop that launches a DoS attack against the congested link. The absolute size of each individual client stream is unimportant, as we are merely assuming that the number of clients sharing a link is high enough to congest it. The last-mile user connection is not the worst problem here.
Network congestion is not the only reason for packets to be dropped. Wifi and mobile connections can be hampered by radio interference or have trouble transmitting through thick walls, hardware can be faulty, wires and fiber can be damaged, routers (especially home routers) can be poorly configured, cosmic rays can flip bits, etc. The internet is not perfect, and network programmers in a certain soft-realtime, low-bandwidth domain have noted that sending duplicates of their tiny packets instead of waiting around for ACKs results in an improvement in reliability and average latency at negligible cost. You're trying to extrapolate this idea way too far. No one is saying we should swap TCP out tomorrow with an alternative that works that way.
This isn't a theoretical, unproven suggestion he's recklessly advocating -- this technique is implemented in several games played by millions without your DoS scenario playing out. Games are typically low bandwidth applications -- they use that extra wiggle room to increase reliability and responsiveness with reliable real world success.
He's also written about flow control to properly handle congestion when it does happen: http://gafferongames.com/networking-for-game-programmers/rel...
To be clear, not all games are networked using the deterministic lockstep technique but most realtime games (FPS and so on) do use UDP to send time critical data, and implement some aspects of reliability, ordering and congestion avoidance within their own custom protocol.
Yeah sorry -- I should have been more clear. I didn't mean to imply most games use deterministic systems.
you're not correct. The amount of bandwidth is increased by a small constant factor, but still likely far less than doubling the total packet size.
The point is though that sending 15x more data is not an issue when you're only sending a few bytes per tick.
When optimizing, you always profile and optimize the bottleneck first. For a lot of people in a lot of situations, latency, not bandwidth, is the bottleneck, and that's exactly what this solution solves. Compared to e.g. Standard web usage, most games are actually very small amounts of data sent, but very high time criticality, and hence very sensitive to latency.
In the case of wifi and such this seems like a sensible approach, too bad its being applied end to end.
All that would be required IMO for this to be a perfect solution for time critical data would be if we had an IP packet header flag to indicate to routers that our packet is time critical, so if it can't be passed on immediately, don't buffer it like a TCP packet, just drop it. -- cheers
The author is a very experienced games network programmer whose recent credits include the multiplatform fps Titanfall.
Then I will make my plea a lot more urgent:
Please do not implement protocols on the internet which respond to congestion by increasing the amount of traffic they send, if there is any chance that they might be used by a large number of people. This is poor internet citizenship and causes real problems. It means your code is a DoS attack against the internet transit links, which can be triggered by anybody capable of inducing low levels of packet loss, and the only effective mechanism to stop it is for ISPs to block the traffic entirely.
Packet loss is how the internet signals that you're sending more bandwidth than it has capacity to forward. Internet protocols need to respond politely to this signal.
Not in this case - packets can be dropped by the route and the game player doesn't care as long as his connection is good enough to play. When it gets really bad the player will likely quit and the packets will stop.
If it is really desired you could implement throttling according to packet loss, but not in the way that TCP does it - by buffering and waiting - instead you'd just send packets every N frames. You can't do that if you're just using TCP since you don't know when packets are dropped.
> If it is really desired you could implement throttling according to packet loss, but not in the way that TCP does it - by buffering and waiting - instead you'd just send packets every N frames. You can't do that if you're just using TCP since you don't know when packets are dropped.
It is really desired (and there are a bunch of ways to do it, and plenty of libraries that already implement them). Please, please implement protocols in this way, and not in the way described in the article.
You're going to be really disappointed when you learn how most games have been networked over the past 10 years.
I have good news for you. We game designers design these protocols to have a (low) maximum rate and a (low) maximum fixed length, because our games are even more performance dependent than the internet and we frequently have very poor or limited networks.
For example, one of Glenn's earlier protocols kept a maximum of 32 previous message IDs, and operated at a rate of less than 30 UDP packets per second. Unacknowledged mandatory messages were given up on after 1 second of loss.
And lastly: the amount of extra bandwidth that Glenn proposes consuming in order to provide quasi-reliability on top of UDP is about as big as the TCP header. So you could consider his protocol a degenerate case of TCP that just usually doesn't send headers.
> For example, one of Glenn's earlier protocols kept a maximum of 32 previous message IDs, and operated at a rate of less than 30 UDP packets per second. Unacknowledged mandatory messages were given up on after 1 second of loss.
That seems far more reasonable than what is described in this article. The protocol described in this article has terrible worst-case behaviour and should not be used.
> I have good news for you. We game designers design these protocols to have a (low) maximum rate and a (low) maximum fixed length
That doesn't help at all. You can pick any arbitrarily large or small per-stream data rate - it makes no different to the outcome. Let's suppose the arbitrary internet link under consideration has a capacity of X bits/sec, and your per-stream data rate is D bits/sec under zero packet loss. The number of users that this link can support is X/D. As the number of users approaches X/D, packet loss will begin to occur due to congestion. Suddenly this protocol causes the per-stream data rate to increase towards 120D. This means that X/D users are each sending 120D bits/sec towards the link, for a total of 120*X bits/sec.
Now, recall that X is the capacity of the link, so we have just flooded it with 120 times its capacity. This link is now dropping approximately (1 - 1/119) == 99.2% of packets sent towards it.
Consider: at no point did it matter whether X or D was large or small. The fatal behaviour is that when congestion is detected, the bandwidth usage increases by a significant factor relative to the norm. The factor of this increase will control the steady-state level of packet loss that the protocol converges on, when the network becomes congested.
I don't think you understand what I wrote. Let me try to rephrase. All of these protocols are tightly bounded in size. There is no way for them to get to 120D over time. What we're talking about is, under packet loss conditions, moving from 20 (IP header) + 8 (UDP header) + 1 byte of command data + N bytes of state/overhead, to 20 + 8 + X bytes of command data + N bytes of state/overhead. N depends on the application, but for my implementation, generally it's around 20. So we're going from 49 bytes to, usually, around 50 bytes, and the absolute max for X is, say, 32; which would be 48 + 32 = 80 bytes, or a little less than 2D.
Now, in a fictional universe where a game protocol operating at 20 updates per second and putting down 50 bytes is causing network congestion, I happen to agree with you. That would be terrible! Never, ever, ever happens, ever, not even .0001% of the time, ever; not ever. And further, never will. But I've only been doing this for 30 years.
This is not always true. Some amount of packet loss exists over WiFi and 3G and in this case this packet loss is not indicative of congestion.
Even if there are spurious signals from other sources, it's still true that packet loss is how the internet signals congestion (absent ECN, which my understanding is basically undeployed on the internet proper).
I don't disagree, but that converse is typically even more true. The strong defense of your approach here is "it's a tiny amount of data, period" (which - paraphrasing - you've said elsewhere). I am 100% behind the notion that one should be careful when responding to packet loss with increased traffic.
100ms is not that high, that's less than what you get when you're in Europe and want to play with your American friends. And the issue is with delayed retransmission when you can actually afford to send the packets again preemptively.
I would still agree that you should have a bit in the TCP stack that says "use preemptive retransmission instead of waiting so damn long". Can you actually tune the wait time for the ACKs?
Yeah, but the 100ms/1% version is more or less acceptable - some client-side smoothing would completely conceal those artifacts. It was at 250ms/5% that things really started to fall apart.
I'd say that 1% packet loss is high, and the most likely reason why you'd have packet loss rates that high is because your network link is congested - in which case, sending more packets is the exact opposite of what you want to do.
High latency, high packet loss, spare available bandwidth. I can think of lots of scenarios where you have two out of three of those things. I can't think of any where you get all three, and this idea only seems useful when you have all of them.
I don't want to play a multiplayer game with hitches every few seconds. You can't fix this with "smoothing" with deterministic lockstep model but you could fix it by increasing the playout delay buffer from 100ms to 250ms, but that's adding another 150ms of latency to the player experience which is less than ideal. It's a much better option to use UDP in this case to get the best experience for most players.
Satellite or radio based ISPs, mobile networking, and if we get exotic, interplanetary communications all come to mind as environments which fit all three criteria.
Interesting - I'm not sure that mobile networking really has massively overprovisioned bandwidth, but satellite and radio links might. Can you think of a way to identify these links so that similar techniques can be deployed when they are found?
No, cellular data is not massively over provisioned; but in the era of LTE, I frequently have more bandwidth than stability.
Also, this scheme is not really asking for massively over provisioned lines; we're looking at fairly small data sets with infrequent bursts to larger sets, and an ack from the receiver knocks the size back down again.
That said, if I were implementing this, I would probably implement a sliding window for messages I send (waterfall encoding would work great here, though it is patent encumbered), and add the ability for the client to request a set of messages it missed. This would make the traffic a bit more predictable and offer the server & client the ability to make up for traffic which was lost during its send window.
You're not sending more packets, but slightly larger packets - as the article says, even 120 frames of data takes up 90 bytes. When, even with extreme repetition, you're sending single digit kilobytes per second over the Internet, it's unlikely that your stream itself is substantially contributing to congestion, unless you have some completely broken/saturated link that nobody wants to play on anyway. (One might want to send enough bytes per frame that such a buffer would start to bump up against packet size limits, but 120 frames is ridiculous. I'm not even sure where he got it from, since 60 * 5 seconds timeout = 300, not 120 - and you don't really want to time out after 5 seconds, since transient total outages happen when accidentally going out of range of a Wi-Fi network or whatever. Layer the opportunistic resending on top of a TCP-like slow resend mechanism for the worst case, and then there's no point sending each packet more than a handful of times.)
I implemented opportunistic resending last year for netplay in a branch of the Dolphin emulator. The target audience was a group of people playing a fighting game (Super Smash Bros.) at a semi-high level, so it was essential to keep latency to an absolute minimum. Even a few frames of lag is enough to substantially impede high level play (might be mitigated by prediction/rewinding if I ever implement that, but since fighting games are based on very quickly responding to others' actions, there's only so good it can get); the 100ms from the article, for example, would be totally unacceptable for anything serious. Even if it were feasible to implement client-side smoothing in an emulator, it would still be much superior to actually get the real data in time.
The question is then whether packet loss is common/significant enough in practice that opportunistic resending's effects are visible in any significant way. For Dolphin, the main advantage of using UDP over TCP is actually NAT traversal, so there is no reason not to do it, but it would be nice to know whether it helped. Unfortunately, I don't have data here: I didn't include code to explicitly record packet loss statistics or anything, and I haven't had the time to actually clean up the code enough to promote it to master, so there aren't many people using it anymore. Anecdotally I implemented opportunistic resending, there was feedback that it was a noticeable improvement; there would certainly be some confirmation bias in there, but the feedback was positive enough to convince me it was likely a real effect. They didn't necessarily have to be losing a lot of packets; as the article mentions, with TCP a single packet lost makes you wait something like 2*RTT, which is certainly noticeable if your input buffer is set to the lowest possible value that produces smooth play under normal conditions.
Sorry I was doing calculations at 2 seconds timeout.
Lets make it even harder for the TCP truthers. Hold on to your neckbeards.
Lets say that somehow sending 90 byte packets 60-times a second for 2 seconds is bad. Lets also say it would be nicer to have a longer timeout like 10 seconds. I agree.
Here's how to do it. Clamp the maximum number of inputs per-packet to 60 (1 seconds worth). If you don't receive any acks for one second something is clearly wrong. If this happens go into "congested mode". In this mode, send packets only 4 times per-second, and keep resending those 60 inputs until an ack comes back. If no ack comes back within 10 seconds, time out. If an ack comes back, go back to non-congested mode and resume sending at 60 packets per-second.
There you go. Congestion avoidance over UDP. Problem solved.
What's a "TCP truther"? Is it hiding under the bed?
> There you go. Congestion avoidance over UDP. Problem solved.
Yes, that sounds entirely reasonable. Please do this, instead of what is described in the article.
The example described in the article sends an average payload of 6 bytes (smaller than UDP header) and converges to a steady state bandwidth that is a function of RTT and the number of player input changes per-second that is generally independent of the amount of packet loss. Given that with this protocol the average average stream is 12kbps, the vast majority of which is UDP packet header, I think your concerns about congestive internet failure are somewhat overblown.
> somewhat overblown.
Possibly one of the most overblown statements I have ever seen. Just try to picture the headline - "Halo 57 multiplayer brings whole ISPs down due to UDP design DDoS". Hilariously overblown.
If we are to be concerned about 12kbps, just wait until my next article when I describe how much UDP traffic is blasted out by non-deterministic game networking models ;)
No the point of this article is to describe how deterministic lockstep works in the context of networking a physics simulation and how UDP is provably more efficient for sending time critical data over the internet in the presence of latency and packet loss.