Freebase: Faking Source Engine Game Server Latency
A number of years ago, I and several friends were playing one of the best games ever made: Garry’s Mod. We were playing our bread and butter gamemode DarkRP and server hopping. As is customary one of us would load the server list, find one of the first servers, usually meaning lower ping and more players - then join it. Though after joining a particular server we quickly noticed something was wrong.. this time our connection felt horrible. Having hosted servers ourselves, we wondered what was wrong. Was it latency? Unlikely, we picked a server from the top of the list and those are always nearby! Was it sv? Or some other performance metric? We quickly consulted the source engine’s net_graph
utility.
Everything looks fairly normal, net message graphs look ordinary, SV is a little low but not unusual for a server of its size (128 player cap) however one thing stood out. The ping was really high! But how could it be? We found this server really high on the server list! You might be wondering what these things have to do with each other, in which case it’s time for a quick explanation of how Garry’s Mod ranks servers on the server list. Let’s consult the following code direct from the game itself.
This is how Garry’s Mod, when you open the server list for a gamemode, ranks each server returned from Valve’s master servers.
function CalculateRank( server )
{
var recommended = server.ping;
if ( server.players == 0 ) recommended += 75; // Server is empty
if ( server.players >= server.maxplayers ) recommended += 100; // Server is full, can't join it
if ( server.pass ) recommended += 300; // Password protected, can't join it
if ( server.isAnon ) recommended += 1000; // Anonymous server
// The first few bunches of players reduce the impact of the server's ping on the ranking a little
if ( server.players >= 4 ) recommended -= 10;
if ( server.players >= 8 ) recommended -= 15;
if ( server.players >= 16 ) recommended -= 15;
if ( server.players >= 32 ) recommended -= 10;
if ( server.players >= 64 ) recommended -= 10;
return recommended;
}
From this entry we can glean a few facts about what might improve a server’s ranking and thus make it more likely to be recommended to any given player:
- Don’t be empty
- Have as many players as possible without being full
- Specifically >64 meaning you want your max player count set >64
- Don’t be password protected
- Make sure you are logged in with Valve’s master servers
But most importantly…
- BE AS LOW LATENCY AS POSSIBLE!
It’s clear that the intention is lower latency, and ostensibly geographically closer / less overloaded servers will be suggested first.
So with this in mind, we wonder what is going on. We disconnect from this server and look at the server list again… it reports 16ms. This is a far cry from the 97ms we just observed in the source engine diagnostics. There is generally some nuance and natural spread in these two numbers attributable to server load, but not 81ms of spread! So what in the world is going on?
When we do a traceroute to get a better look, we start to paint a more geographic picture of the situation. The server is in Dallas, Texas! We are on the East Coast and while it is physically possible for a packet to travel that far in 16ms, the practical networking reality makes it extremely unlikely. 16ms from Philadelphia to Dallas would be unheard of!
We start sleuthing, messaging larger hosting providers and server owners alike to get answers. There is one word on everyone’s lips: anycast. And folks are supposedly paying big bucks for it. But what does that mean? Anycast is already a well established technology in the web world used for distributing content closer to users. But it only works because web servers can be horizontally scaled and placed in multiple locations. You can make multiple copies of the same content. ie: asynchronus. The same cannot be said for a realtime gameserver. It can only exist in one location and must be near-perfectly synchronous with its users. So what the hell are they talking about?!
< i might flesh this part out with our troubleshooting steps later but this is just a placeholder for now >
First we must ask ourselves, how does this all work under the hood? How do we even know what servers are available? Or their latency?
Lets explore: When you first launch Garry’s Mod and want to browse online servers, your client must first contact the Valve Master Servers. The master server in question will return a huge list of raw IPs and ports. After which it is the client’s job to start invidually contacting servers and querying them for their info.
It does this through a little message type called A2S_INFO. It sends the server this message as effectively a “hey, what’s up? who are you?”. The server will then return a response containing lots of fields describing its state. Name, player count, gamemode, map, whether it uses anticheat, and much more. However while the client is doing this, something is also happening the the background.. the client is timing this process between the time it sent A2S_INFO query and received the response. That is to say, calculating the latency to the server!
If you, dear reader, are particularly crafty, you are probably thinking “this sounds very exploitable” and it is! This is the part of the process where we can have some fun! Ultimately (at the time) the A2S_INFO and following response are just chunks of encoded data sent over UDP. We could theoretically cache this data closer to the user… and this is where the magic begins. :)
┌──────┐ ┌────────┐ ┌──────┐
│client│ │bgp edge│ │server│
└──┬───┘ └───┬────┘ └──┬───┘
│ │ │
│ A2S_INFO client query │
│────────────────────────────────────────────>│
│ │ │
│ A2S_INFO reply (100ms) │
│<────────────────────────────────────────────│
│ │ │
│ │A2S_INFO freebase query│
│ │──────────────────────>│
│ │ │
│ │A2S_INFO reply (cached)│
│ │<──────────────────────│
│ │ │
│A2S_INFO client query│ │
│────────────────────>│ │
│ │ │
│A2S_INFO reply (20ms)│ │
│<────────────────────│ │
┌──┴───┐ ┌───┴────┐ ┌──┴───┐
│client│ │bgp edge│ │server│
└──────┘ └────────┘ └──────┘
To test our theory we setup a BGP anycast network on AS137909 (now AS10419) using Vultr VMs as our edge since they were the cheapest provider that offered drops with BGP peering capability. We then created a very small service written in Go that would query a given SRCDS (Source Engine) server using A2S_INFO, store the response, then respond to any A2S_INFO queries it received with the stored response. In essence creating an A2S_INFO cache. We dubbed the project “freebase” and spun up a small test system where we had a user in Miami attempt to connect to our server in Beauharnois, Canada.
Through this system we were able to prove out that we could cache A2S_INFO packets at the edge of our network and intercept requests for them replying much faster than the source server could thus making the server seem closer than it actually is and moving its ranking much further up the server list for more players. Latency = arbitraged!
If this sounds scummy, it arguably was. There were empassioned arguments on the now defunct Facepunch forums and pleas to FP employees to provide some kind of workaround like they did with the old master server / player spoofing debacle.
This has since been patched out with Valve’s introduction of the challenge response mechanism in the A2S_INFO message format. It was primarily added to prevent amplification attacks much like the DNS amplification DDoS attacks of yore but has had the added side effect of making this specific hack impossible.