Players Guide to Unreal Netcode by =NUB=garfield Contents: 1. Introduction 2. Internet basics 3. Basic UT stats and settings 4. Advanced stats 5. Server configuration 6. Advanced netcode 7. Credits + Contact 1. Introduction: This guide tries to explain how Unreal works online, what ping and lag mean and generally how things work and what can (and cannot) be told from the various statistics offered by Unreal. It is meant for everyone from average online player to large scale server admin. However, you don't have to worry if you dont understand everything - most of it wont be needed if you just want to play (if it was, how could you have played for so long without it? ;)). Actually (as I find out while writing it) it's probably far too much to read if you don't really care, anyway. ;) But if you always wanted to know if your server is lagging because of cpu overload or why that rocket can explode in your opponents face without hurting him, this will give you the answers. Some of the facts in this guide are much discussed these days, so to be sure to get everything right, I verified with people who really know Unreal. In most cases this was Mongo (known to some as the maintainer of RA:Unreal) who modified the Unreal netcode for the UW engine - I doubt there is anyone knowing Unreal netcode better them him outside from Epic. So unless explicitly stated as not known for certain, you can take the information here for granted. 2. Internet basics This will cover basic information about what "speed" and connection qualitiy mean on the internet. If you already know what ping/latency, bandwidth, tracert and packet loss are, you probably can skip this. Usually connection speed is given in kb/sec or kbit/sec. This "speed" is called bandwidth, and it's not all that matters. The second component is called latency and describes the time that data actually needs to reach its target. It is usually measured in milliseconds (1/1000 seconds, ms) and often referenced as ping (though ping is actually twice the actual latency because it first sends something to the target and then gets an answer back from the target). A good analogon to see the relevance of both factors is to imagine a truck loaded with ten thousand harddisks, each with 10 gigabyte of data. Say the truck makes 50km/h - if you transported the data with it to some place 100 km away, you'd have a bandwidth of approximately 1gb/sec! Now would anyone want to play over that connection? I guess not. :) As Unreal players know, a lot of different values are called ping - Unreal itself gives two, and both are not the "standard" icmp ping (often called dos ping because windows offers a commandline program called ping). More on the Unreal pings later. If you want to find out why you have a bad connection to a server, it is a good idea to first check the route (the path your data takes to the server and vice versa) to it. Find out the address of the server (either ip or url), open a dos box and type "tracert ". You will see the path your data takes to the server takes, and you will also see the ping to each "hop". (pic 1: tracert output) If you see a ping increase at some point (say, ping to the first hops is 30, then after some point its greater 100 for all hops) you can be sure it is not the server or your machine itself making problems, but the internet itself. Unfortunately there isn't much you can do about that usually. If the problem persists, try to contact your isp and show him some tracert results. There are a lot of public domain and shareware tools that are a lot better at "tracing" than tracert is. I personally like ping plotter (http://www.pingplotter.com/), but try yourself what you like best. Data on the internet is transfered in packets. Now if for whatever reason (often it's a capacity shortbreak or a piece of damaged hardware at some point) some packets don't reach their target, that is called packet loss. To find out where that happens, use one of the tools mentioned above (ping plotter does a fine job at it). If either of those happen already at the first hop, chances are good something is wrong with your local setup and/or hardware. If it only happens at the last hop (the server itself), there probably is something wrong with - you guessed it - the server. Some (very rough) guidelines on what you can expect as icmp ping (this only applies to Germany - average connection quality varies a lot from country to country): ISDN - ~20 ms to first hop, 30 ms to a server with a good connection to you. TDSL (no fastpath) - ~30-60 ms to first hop, 40-80 ms to a good server, depends on interleaving factor TDSL (fastpath) - slightly (~5 ms) better than ISDN QDSL - ~10-15 ms to first hop, 25-35 to a server with a good connection to you. If you dont get even close to these values, there could be a lot of reasons - crappy isp, crappy setup on your machine, crappy or overloaded systems at your dialup node, crappy routing are the most important coming to mind. 3. Basic Unreal stats and settings In Unreal you can basically see two pings. The ping in the scoreboard (f1 ping) and the ping in the stat net display (press f6 in default keyboard layout to see it - f6 ping). Both are usually pretty different from each other. The reason is simple - they are calculated in different ways. F1 ping is the ping the server gets to you while f6 ping is what you get to the server. While icmp ping wouldnt show a difference for that, Unreal does because the server and the client need different times until they send an answer back. More on the exact difference later. The second basic (because very important) thing in the stat net display is packet loss (pl). This is shown twice in IN and OUT columns - IN means the traffic you get from the server, OUT is the traffic you send to the server. As you probably know, pl is a very bad thing. If you get constant pl on a server, all kind of crappy things can happen. Packet loss means that for whatever reason, some internet packets dont reach their target - either from you to the server, from server to you or (most frequently) both. That in turn means that you as a client might miss important information (like rockets being shot at you, other players moving around) and/or the server may do so (like knowing where you aim or that you didnt shoot that rockets right at your feet). Usually packet loss is caused by some problem in the internet route from you to the server. To find out where exactly, see chapter 2. ;) About the only thing you can configure for onlineplay clientside in Unreal is netspeed. First thing - forget Epics recommended settings. Second thing - yes, clientside fps is somewhat capped by netspeed. However, that doesnt really play an important role because it only places a maximum on your fps. That obviously results in less avg fps in display, but during the important situations your fps wont drop because they are below the cap anyway. Keeping that in mind, you should always set your netspeed according to your real line capacity. Take into account if your connection is capable of handling that traffic in both directions simultanously (full duplex) or not (half duplex or some mixture). For euro isdn 64k, 6500 seems like a reasonable netspeed (it isn't really full duplex, but can handle like 6.5 kb/sec up and down simulatnously). For DSL and similar fast lines 20000 as default seems reasonable (you will never get more traffic in either direction anyway), but if you have trouble on some servers, try lowering your netspeed - there might be a bottleneck somewhere between you and the server. This is especially true if playing over long distances from country to country or even intercontinental. While setting your netspeed to extremely high values might get you seemingly better fps, it could get you into problems when your machine is really fast; you could create more traffic to the server than your connection could handle, resulting in all kind of negative side effects. Even a maxclientrate appropriate to your connections capabilities won't help there because maxclientrate on server just afflicts downstream (traffic from server to you), not your upstream. So now you're playing, your ping seems ok, no pl, and you still find playing laggy? That could be caused by various reasons. If that happens on all servers, the first thing to check is your local configuration. Firewalls, virus scanners, ICQ and other IM clients can all interfere with Unreal online play, so it's a good idea to try switching them off for playing and see if it helps. The next thing to check is drivers (especially if on ISDN) for your connection device, but also for graphic cards and other stuff. If you're connected to an external (DSL or cable) modem via ethernet, your network card might need other/newer drivers, or it could have some irq conflicts that cause trouble. Once you checked your local machine, stopped all background tasks that aren't neccessary for survival, upgraded drivers and generally made your system run smoothly, the next thing to do is to check the connection to the server with one of the trace tools mentioned in chapter 2. Run such traces for some time (several minutes when pinging like once a second) and see if some ping spikes or pl occur. If you see problems there, try changing isps if you can to see if it helps. If both of the above fail, it's time to have a closer look at the server - see the next chapter for this. 4. Advanced stats Stat net offers more information than just your ping and pl. Lets see what we have there - from top to bottom: (pic2: stat net output) Ping - if you dont know what this means, go read chapter 2 and 3. Channels - the numbers of actors on the server that are currently relevant to you. See Chapter 6 for more on this. All of the following are given for both directions - IN means from server to you, OUT means from you to server. Unordered/sec - number of packets that were recieved in another order than they were sent. If this is not constantly 0, something with your connection is seriously fucked up. packet loss - percentage of packets that didnt reach the target. Should be 0 all the time. If it isn't, this is almost always caused by connection problem; see chapter 2. packets/sec - number of packets recieved/sent per second. See below for more info. bunches/sec - number of actor updates recieved/sent per second. See chapter 6 for more on this. bytes/sec - number of bytes recieved/sent per second. netspeed - guess yourself. From a player only perspective, the only thing that is interesting here (other than ping and pl, of course) is bytes/sec. This will always be capped by your netspeed (with a very few rare exceptions). For OUT side, it should actually NEVER exceed netspeed - because that is what the frame cap is for. If this reaches your netspeed on IN side, you might start missing out some information. If that happens only rarely in extreme situations, it's no reason to worry; however, if its a permanent condition, you might miss important data the server cant send you because your bandwidth is saturated. This usually happens due to unreasonably high tickrates on server side. Tickrate basically is the "fps" the server is running at. It's basically the most important variable a server admin can change. There is a tradeoff here - higher tickrate means more cpu load and more traffic server->client, but also means better pings (both in display and effective) and generally a more precise simulation on the server. The default tickrate (20 for internet servers) is what causes the horrible ping in stat net display - when the server runs at 20 ticks a second, that also means it can take up to 1/20th of a second before it can acknowledge a ping from a client - that means 50 ms! This also affects gameplay - having a server run at tickrate 20 means that player commands have to wait for 50 ms in worst case before they can actually affect gameplay. And aiming becomes less accurate - just think of playing with 20 fps. If you move your mouse fast, there are large "jumps" in your viewrotation at these low values, and that is exactly what the server will do with your aiming if it is running at such a low tickrate, causing "gaps" in your aiming movement. This is also the cause for the seemingly increasing damage for the continously firing weapons (pulse secondary and minigun) - aiming just gets more precise, causing more of the actually single shots to hit the target. In the last year, server admins seem to have discovered that at a large scale. However, some of them went too far when fixing that misconfigured default setting - you can find servers running at tickrate 100 out there. Now while that is fine for people with a lot of bandwidth (as long as the server cpu and network connection is capable to sustain it), it is NOT fine for the average ISDN player. The server->client traffic increases nearly at the same rate as the tickrate does, and tickrates of 100 generate too much traffic for a 64 kbit isdn line. That results in either packet loss (if maxclientrate on server is high enough and netspeed of the client is set to more than the line can actually handle to increase fps rates - see above for that..) or in the client missing game information ranging from decals to really important stuff like player movement. As a player you cant change that situation. However, you can see what tickrate a server is running at: type "inject userflag 1" at console (inject userflag 0 to turn it off) (pic3: inject userflag output). The first number you see is the (max)tickrate the server is running at. You can NOT, however, find out the (configured netservermax-)tickrate by checking the number of incoming packets/second in all cases - linux servers behave somewhat different. See chapters 5&6 for details on that. Another use of inject userflags 1 is to identify the creeping ping bug (known on win2k and some (outdated by now) linux versions). The second number displayed is the number of clients connected, and since creeping ping bug is caused by players not getting disconnected properly after a mapchange, this number will be larger than the number of actual players (plus specs) actually on the server as long as creeping ping bug is active. The last important information (for a player) you can see in the inject userflag 1 display is the server cpu load. The 3rd and 4th numbers added give the time the server needed for the last tick. Since that time is measured from "starting tick" to "finished", it can be taken as a cpu load indicator. The server does (or tries to) tickrate ticks per second, each of them needs net+act milliseconds. So if net + act << 1/tickrate * 1000, the server has enough horespower to sustain the tickrate. On fast servers I usually see those times being like 1/3 or even lower of what would be needed, even if there are multiple servers (Unreal and non-Unreal) running on the same machine. Inject userflag 1 uses a lot of bandwidth; the traffic generated by it can very well exceed netspeed/maxclientrate limits and is not reduced by it. So only use this when trying to analyze a server, not in the middle of an important match. On a sidenote - the reason for "f1-ping" being lower than stat net ping is simple: most clients will run at more fps than the server. That means the server will get an acknowledge to a ping request faster than the client will get it from the server - he just has to wait 1/fps seconds (plus the actual icmp ping, of course). For some strange reason (probably an attempt to filter out those dependencies) the client substracts half its frame-time (1/fps) from its own ping - that is why you'll always see a lower ping on your own machine than others will see for you. There is one thing left to mention about packets/sec - if your fps get lower than this, you will mostprobably experience lag; this is the situation called "invisible packet loss" because it feels exactly like pl, but no pl is shown. I've yet to see that phenomenon myself, but it seems that Unreal has problems with handling two waiting packets in one frame. No solution is known to this - I'd suggest tweaking your Unreal for more fps. If you're willing to give up the "looks", there are many ways to improve your fps. Try one of the many many Unreal tweaking guides out there. Or buy a faster machine. ;) 5. Server configuration This section is meant for server admins; I will try to explain the main options that are configurable on Unreal servers. For nearly everything, having an admin password is enough to change settings - use "admin set ipdrv.tcpnetdriver " after doing an adminlogin. To get the current values, use "get" instead of "set" (and remove the part). As mentioned above, the relevant settings for Unreal netcode on servers are found under the ipdrv.tcpnetdriver section of unrealtournament.ini (or in whatever ini your server uses). It looks like this: [IpDrv.TcpNetDriver] AllowDownloads=True // allows clients to download missing packages like maps or mods from the server ConnectionTimeout=15.0 // this is the time the server waits after getting no more traffic from a client before it decides "client disconnected" InitialConnectTimeout=150.0 // this is basically the same as above, but applies while client is not already in game (e.g. while client tries to connect). if your server is experiencing the "creeping ping" bug, set this to something like 15 to lower the duration of the bug AckTimeout=1.0 // does nothing at all in current versions KeepAliveTime=0.2 // the time after which a server sends a packet regardless of actual updates neccessary. see below. MaxClientRate=20000 // maximum bandwidth the server allows for server ->client traffic; for each client, minimum of clients netspeed and maxclientrate is used. this does NOT apply to client->server traffic. SimLatency=0 // does nothing at all in current versions RelevantTimeout=5.0 // leave at default - see chapter 6 SpawnPrioritySeconds=1.0 // leave at default ServerTravelPause=4.0 // time the server waits after map end before switching to next map NetServerMaxTickRate=20 // server "fps" used for internet servers LanServerMaxTickRate=35 // server "fps" used for servers started with ?lanplay url DownloadManagers=IpDrv.HTTPDownload // dont change DownloadManagers=Engine.ChannelDownload // dont change Most of the values above don't have to be changed and won't make a big difference in most cases. The major exceptions from this are InitialConnectTimeout (lower this to 15 if your clients experience "creeping ping" bug) and of course the much discussed NetServerMaxTickrate, usually refered to simply as tickrate. When deciding what tickrate to use on your server, there are several things to consider: - Server OS: While Windows-based servers actually behave like they should, Linux servers do not. More on this later. - Client connections: For ffa servers, you may (or may not) want to keep your server playable for modem players with very limited bandwidth; for clan servers, your clients usually will have ISDN or better connections. Also consider if many international matches are played on the server - long international routes might have low bandwidth even for clients that have a high local bandwidth. - Maximum players and gametype/mod: Obviously, the more players on the server the more bandwidth each client will need (more players visible, more other stuff like projectiles to send to the clients). The gametype/mod also is important since e.g. CTF will usually produce a lot more spam and players visible at the same time than TDM-Pro, while total conversions like TO, SF and all the others might produce totally different traffic. A well known example might be the "extreme" mod that generates FAR more traffic than normal Unreal. So what is it that makes Linux servers different from Windows servers? The results are known to most: the number of packets sent per second is not constantly at netservermaxtickrate. By design, Unreal should send a packet every tick (see chapter 6), but even when being far below 50% cpu usage, Linux servers tend to send less and not even a constant number of packets. This also affects f6 ping - if a client recieves only 10 packets/second, it might have to wait 100 ms until his ping request is acknowledged! One (possibly the only) reason for this is a problem in Linux Unreal threading. Threading is the concept that allows for multitasking: programms tell the OS they are done with their current task for now and hand over cpu time back to it. It seems that the commands doing this in Linux Unreal are not written correct, so the server "sleeps" (between executing ticks) for a longer time than it should. This means the server actually runs at a lower tickrate than the configured netservermaxtickrate even if there is plenty cpu time left. The rate at which this happens varies with time and possibly with other unknown factors. Unfortunately, there is no real cure known for this so far. In most cases it seems that servers tend to send packets at a nearly constant tickrate at least when there is enough "action" in game, but even this is not always the case. With all this to consider, it is not easy to find the "perfect" tickrate for a server. Before we get to recommendations for some cases, another note on how tickrate relates to traffic. While it is generally true that a higher tickrate causes more traffic, the increase is not linear (2x tickrate does not mean 2x traffic). The reason for that is simple - while some stuff (such as player aim) actually changes often enough to result in updates ready to be sent each tick, other things do not (e.g. players do not shoot more often, players dont change movement direction more often). And when there is nothing to update on a specific thing (new projectile, new movement direction,..), no traffic is generated for that. As a result, increasing tickrate by factor x will create less than x times the traffic. With all this in mind, some recommendations for tickrate settings in various cases will follow. These are meant for Windows servers; since Linux servers never run faster than they should, these can be used for Linux servers, too. The values assume a bandwidth of at least 5000 bytes/second full duplex for players connection unless mentioned otherwise. 1on1 servers: This is easy. With only two players generating "action", there is no real bandwidth problem; you should be fine with running at tickrate 50. 4on4 tdm servers: Tickrates around 40 have proven to be reasonable. If only players with at least 6500 bytes/sec bandwidth connect (like in an average german clan match), tickrate 50 seems to make best use of the bandwidth. 5on5 ctf servers: Tickrate 30-35 should be fine. For international matches, you will have to consider if there really is a bandwitdh of 5kb/sec available between clients and the server. If clients start getting packet loss or ping increases that do not appear with lower tickrates, there most probably is some bottleneck in between. In case you wonder why Epic themselves recommend much lower values - there are two simple reasons for that: First they have to consider the average buyer who might have a 56k modem (and analog modems dont get close to full duplex), so they assume netspeeds like 2800 - and servers with 16 players. This does not seem reasonable today and especially not for the average regular online player. The second reason is that epic never cared for competative gaming, so as long as the game is actually playable in some way (in the meaning of the word, not what a regular online gamer would call playable), it's fine from their perspective. 6. Advanced netcode This chapter goes into more detail how Unreal netcode actually works. To understand all details, knowing the basic concept of object oriented programming is helpfull, but hopefully not really neccessary. So how does Unreal work online? To understand this, you first have to understand the basic problem that all games played over the internet share. This problem is latency - sending data over the net takes time, so it is absolutely impossible to have all players being totally synchronized. That means players will never be able to see exactly the same thing, there always is some kind of delay before player A can see what player B did. Like most games today, Unreal uses a netcode model centering around an authoritative server. Authoritative means that the server decides what "really" happens; he has the authority over gameplay. If the server "sees" player A hitting player B, player B will take damage - regardless if player B saw A hitting him or even A seeing that he hit B. It also means the server decides where players actually are. Clients will try to simulate the "real world" of the server as exactly as possible. The accuracy of this simulation obviously depends on ping. To make this simulation as precise as possible, there are basically two ways used in todays online games. One of them is to let the client try to predict the state of the "real game" from the information he has (this is what Unreal does), the other is to let the server "backtrack" the actions of players to the time when they actually took place (HL does this, zeroping, an Unreal mod, tries to do something similar for Unreal). The second model is mentioned here only for information purposes and the zeroping community; it is not relevant for Unreal itself. Both of these models have pros and cons. With the first model, clients have an outdated version of the real world - other players will have moved further on the server already, and in addition the server gets the clients movement with some delay (both together are the reason why you have to lead your aim with ping). While some other games clients try to extrapolate other players movement by ping (they take the current movement of players and act as if this movement goes on for ping milliseconds; QW is one of these games), Unreal does not. However, since the framerate on clients is usually higher than the number of updates the client gets from the server, Unreal actually extrapolates movement in the same way as mentioned above until it gets a new update on the clients movement. This extrapolation is the reason for seeing players "sliding" when having lag spikes - Unreal keeps extrapolating their "old" movement until it gets updates on their "new" movement. With the second model, players who see they hit someone on their client will always do so in the "real world" of the server, but since the "real time" on the servers is ahead a bit, it means players will notice they are hit (or even died) some time AFTER they were at the place where they were hit. Any shots they fired after that will not actually have been fired, and in some circumstances they can be already out of sight in their own simulation before they are notified they have been hit. So now that we know how the basic concept works, let's have a look at how it's implemented. The obvious way to simulate the server world on the client would be to send everything that happens from server to the client. Unfortunately, this would use far too much bandwidth. To reduce bandwidth usage, several things are done; some of them (those who can have an effect on gameplay) will be explained here. To understand how these methods work, we first have to look at how things are actually represented in Unreal. For everything that exists in the game, there is a template called "class". Each actual object (for everything that is relevant in gameplay, these objects are called "actors" in Unreal) is called an "instance" of such a class. E.g. there is a class for "rockets", and every time a rocket is created (i.e. fired) it will have the properties defined in the class. Unreal uses various properties for actors to define how they are replicated (sent) to the client (or in the case of player movement, from client to server for the actual player's client and server->client for the other clients). Back to the bandwidth saving methods used. The first is pretty obvious: If the client is not able to see or hear actors (be it other players, rockets, flakshards or whatever), there is no reason at all to tell the client what they do. Unreal actually checks this (it's called relevancy check) and does not replicate actors that are "out of sight" for the client. The variable "relevanttimeout" in ipdrv.tcpnetdriver on the server defines how long an actor has to be out of sight before it becomes irrelevant to the client. The effects of this relevancy check can be seen when playing (clientside) demos with ?3rdperson (allowing you to spectate the match in freeflight) - players that are not in sight of the player who actually recorded the demo are not visible, you can actually see them disappear when they are not in sight any longer. Of course, as soon as an actor becomes visible (again or for the first time), the server replicates it to the client. The next important method used is simulating an actors behaviour on the client where possible. This applies to most projectiles (like rockets). When a projectile is fired, the server tells the client where exactly it is spawned (created) and in which direction it is flying. From that point on, the client simulates the projectiles flight without any additional information from the server - the client knows the speed, and since no direction changes happen, there is no reason why the server should have to send the client position/movement updates. This simulation, however, is what causes the "phantom rocket" situations where a client sees his rockets hitting another player without the player dying when he seemingly should. The reason is simple - since the rockets are simulated on the client, the client also decides when they "explode" (visually - damage is of course decided on the server alone). So if the client sees the rockets hit a player, they blow up - but our client world is not exact, especially for other players who may have moved away a bit in the "real world" on the server already. To sum it up - while the rockets miss the "real" other player on the server, they hit the slightly wrong positioned player on the client. You can be sure that you did damage if you see the other player bleed or "being bumped around" because both the blood effect and the player momentum change are done server-side. Another property used for bandwidth usage is the "netupdatefrequency" which is defined for each class. While some actors (like players movement) obviously need updates as often as possible, some others dont. Gameplay does not take any negative effect when stuff like player scores (which obviously have to be replicated, too, since without them the client would not be able to show them to the player) is not replicated 50 times per second. Netupdatefrequency defines how often actors are replicated to clients (that's a maximum value). Netupdatefrequency is responsible for the delay players notice in their armor status when taking hits - armor is "only" updated 10 times per second, so there may be up to 100 ms delay before a player is informed that his armor is going down. With the highest netupdatefrequency value used being 100, it also becomes obvious why tickrates > 100 are totally useless - nothing will be updated for clients more than 100 times per second anyway. There is, of course, much more stuff involved in replication, but some of that would be too complex for the purposes of this guide, and others just dont directly influence gameplay. With all this information, the "channels" and "bunches" values from stat net can be explained easily - channels is the number of actors that currently is relevant to you, while bunches is the number of actor updates (that's not the precise definition, but basically equals that number) that has been recieved in the last second. With these numbers, it is very easy to understand how important the "no variable change - no update" rule is - channels often exceeds bunches by a LOT. Another thing worth knowing is that Unreal servers send a packet to clients at the end of a tick whenever there is any actor that has an update to be sent. That means that at most times, Unreal does send one packet per tick even though not hardcoded to do so - there always is an actor that has something to replicate. The only reason why number of packets/sec could be lower than actual tickrate during a real Unreal game (not netservermaxtickrate - see the explanation about the problems with linux servers in chapter 5) is that a client would exceed its netspeed limitation on bandwidth with that packet. 7. Credits + Contact I want to thank a few people who in one way or the other helped with this guide: Mike "Mongo" Lambert - without him, this could never have been written TNSe - he helped a lot with figuring out what's wrong with Linux servers contact: garfield@planetnubbel.de