header image with text Domos is now part of CUJO AI

Domos Has Been Acquired by CUJO AI

explore Poe

Domos Proof of Enhancement

SDKs

Resources

RESOURCES

Quality of Outcome (QoO)

Webinar Series

Understanding Latency

Understanding Network APIs

White Papers

Proof of Enhancement

REPORTS

Application Outcome Aware Root Cause Analysis

Performance Measurement of Web Applications

Evaluating QoO in Challenging Network Conditions

Back

Blog

What causes latency and why more bandwidth (often) does not mean lower latency

September 26, 2022

Magnus Olden

Chief Technology Officer

Network Latency, sometimes called delay or just time, is the time it takes for Internet traffic to reach its destination. It is critical to how quickly web pages load, your chance at winning an online game, and how many times you have to repeat yourself in a video conference. The bandwidth of internet access has been steadily improving for decades, but performance in terms of latency has not seen anywhere near the same regular improvement.

But what causes latency?

In very short terms it can be distilled into 3 categories (and their analogues):

Physical Travel time (travel time assuming no traffic)
The Processing time (getting in and out of the car)
Variability (queuing) due to competition with other traffic (extra time due to traffic)

We could have ended this article here, but unfortunately, there is quite a bit of complexity, especially on the last point. The variability is also typically the leading cause of high latency in real life usage.

And surely, increasing bandwidth will decrease the queuing due to competition with other traffic? Well no, and we will dig into the two main factors for why this mostly isn't the case:

The majority of internet traffic is sent by capacity seeking algorithms. In its essence these algorithms ensure that there is always insufficient bandwidth. We will explain this later.
The variation of throughput for wireless traffic. When a lane on the highway suddenly gets closed, queuing arrives.

Fundamentals of sending data over the internet

Any data transmission over the internet is divided into packets, which is what allows us to share internet capacity with other users simultaneously. Packets have a delay of arriving - so called latency. The data has to travel. How many packets you can send at once multiplied with the sizes of the packets is the bandwidth. I.e., by bandwidth, we don’t necessarily mean the bandwidth you pay for from your Internet Service Provider, but the amount of data that can be sent per second, this can be constrained by the quality of your WiFi, the bandwidth you pay for, or other things. (Almost) every packet can have a different latency, or it can even get lost, which one can model as infinite latency.

Bandwidth is necessary, but not sufficient to get a smooth internet experience.

For many broadband and mobile subscribers in the developed world, latency is more critical to how fast pages loads, smoothness of video conferencing and gaming than bandwidth.

A key point, that many seem to be unaware of is that Full HD Video conferencing, cloud gaming, video streaming requires just ~3 - 5 Mbps.

Causes of Latency

Physical Travel Time (sometimes called the geographical latency)

The speed limit over the internet is primarily governed by the speed of light and how fast light or other electromagnetic waves travel through different mediums such as fibre cables or air. The path of the cables also matters. The latency caused by this is typically stable and requires new cables or some other physical change to improve. That being said, rerouting due to unbalanced load on the network is fairly common, and will change the latency. A rule of thumb is that it takes about 100ms to cross the Atlantic. Unfortunately, it is really hard to increase the speed of light.

Processing Time (sometimes called serialization)

This is the time it takes for the network equipment (WiFi router, network switch etc.) between you and the server you send data to, to make the packet ready for the next part of the trip. The analogy of getting in and out of your car is pretty good if imaginge a road-trip with 5-20 stops. The more people that are getting in and out of the car, the more time it takes. The bigger the packet, the more time it takes to process it. And much like getting in and out of your car typically is not the greatest cause of your travel time, the processing is typically the least significant cause of latency(at least compared to the other causes) in a modern network (correctly functioning, your car door may be broken). Unlike getting in and out of your car, processing latency has seen continuous improvement with faster computers.

Variability (queuing) caused by competition with other traffic

Try driving into a densely populated area during rush-hour and you will understand this cause. Much like driving in rush-hour, the variability is often the leading cause of latency. And it is the leading cause of poor application performance.

To understand this cause properly, we need to understand some of the fundamental building blocks of the internet.

In its essence, bandwidth and latency is intrinsically linked, as insufficient bandwidth causes latency. Because, what happens when there is more data to send than can be sent? Well, it goes into a queue. Waiting in this queue is, well, latency.

Insufficient bandwidth huh? Just get more bandwidth then. Well no, because the insufficient part is not about how big the bandwidth is, but how big it is expected to be.

When an application or operating system expects more bandwidth than there is, it sends too much data, which is subsequently queued. This means more latency, often for everyone sharing the same network.

Expected really is the key word here. Let's understand the two principles that explain why.

1. Maximizing bandwidth utilization

For an application or operating system the network is just a black box. The app has no idea how great the bandwidth is. To find the bandwidth and utilize it best possible, algorithms named TCP,QUIC and BBR were invented, they account for about 90 % + of all internet traffic. These algorithms send increasingly large amounts of data until it reaches the bandwidth capacity. When some of the traffic is lost or significantly delayed, depending on the algorithm, they know they've reached maximum capacity.

I.e., They are what we call capacity seeking, and they know they've reached maximum capacity by waiting to see if there is any missing or delayed data. This waiting time is key to why latency occurs. The algorithms don’t know they’ve sent too much data before it has waited a relatively long time without any response.

As the algorithms always maximise bandwidth, and use delay or loss as signals, there will (almost) always be insufficient bandwidth. I.e., if you double your bandwidth, the algorithms will still maximise it.

2. Rapidly changing bandwidth ~ Wireless technologies specific (WiFi, Mobile)

For wireless technologies, the amount of data an end-user device can send in a second depends on signal strength and congestion levels. In an active environment, a WiFi device going from 800 to 80 Mbps happens all the time.

Combine 1 and 2, and what happens when you maximize rapidly changing bandwidth?

You overshoot and try to send too much data often. In our example, the application or Operating System can expect to be able to send 800 Mbps but can only send 80. Data can't be unsent. It tries to send 10 times more than it can and the excess data is put into queues. And since the data rate is 1/10th of what it was, it will take 10 times as much time as expected to send this. TCP, QUIC, BBR cannot respond quickly enough to the wireless changes, as they need to wait and see if they are getting their packets back. Changes in wireless bandwidth are much more rapid than the capacity seeking algorithms can respond to.

When the bandwidth is reduced, too much data is sent. It goes into queue and causes latency, often for everyone using the same network.

Because of this, a device that can send 80 Mbps and streams netflix (requires 5-20Mbps) and uses Teams (requires 1-5 Mbps) can have lagging teams. In fact, nearly all variable latency (also known as jitter) is caused by queueing.

So, can’t you just stop having changing bandwidth for wireless traffic? Unfortunately, no. Changing bandwidth is fundamental to wireless technologies. It is what enables mobility and multiple users. How much data can be sent at once depends on signal strength, and how much share of the time an end-user device can send depends on the number of active users

A key thing to notice in our example is that having a bandwidth of 100 Mbps or 1000 Mbps will not reduce the amount of latency created.

Note, I am not trying to argue that improving bandwidth isn't a good thing, however it does not equal lower latency.

Queuing and Scheduling Mechanisms

How the queue is managed also matters a lot for the latency. There are many techniques that reduce the latency caused by competing traffic, and that tries to signal the capacity seeking algorithms to reduce their throughput in smarter ways.

Queue Management is a complex topic and will be the topic for another deep dive.

Summary

Latency is caused by the physical travel time, the processing time and the queuing that happens along the way. The queuing is often the main factor and this is happening due to the foundational internet protocols.

Deviation between the bandwidth applications and operating systems expects, and the actual bandwidth is a major cause of latency. Wireless network technologies bandwidth changes faster than the applications and operating systems can react to.

Hence, increasing the bandwidth does not equal lower latency.