Load balancing refers to a number of distribution techniques which help spread workload and traffic across different network servers. Put in people terms, the concept is simple: the more hands that work, the less work each person has to do and the more efficiently a job gets done. In a computer network scenario, this “community labor” helps increase overall computing efficiency—raising throughput, optimizing performance, and minimizing downtime.
Local vs. Global load balancing
Originally, load balancing referred to distribution of traffic across servers in one locality—a single data center, for example. However, today’s computing is increasingly online, and increasingly global. Thus, load balancing has taken on a much broader meaning.
Global Server Load Balancing (GSLB) is based on the same principle as traditional load balancing. However, workload distribution is no longer confined to a single local network or data center. Rather, workload is distributed planet-wide (cross data center). This creates many new challenges for modern load balancing solutions, since they must take into account key communications parameters—most notably the connection quality between geographically dispersed sites at any given time, and actual requester geographical location.
Early generation GSLB solutions relied on DNS Load Balancing, which limited both their performance and efficacy, as discussed below. Today, cloud computing provides an outstanding solution for both local and global load balancing, enabling one cloud-based service to handle both scenarios.
DNS: Un-synchronized and not load-aware
Once the standard for Global Server Load Balancing, DNS solutions are now considered acceptable mostly for simple applications or web sites. More and more network administrators recognize that DNS Load Balancing cannot meet the requirements of enterprise-class GSLB, for a number of reasons.
First, DNS-based services uses load balancing pools, which are defined for different geographic regions. The load balancer administrator defines which web servers are available and how often traffic should be routed to them. This enables—in theory—maximum exploitation of geographically dispersed infrastructure.
However, DNS load balancing is based on a traditional round robin distribution methodology. This means that the load balancer reviews alist of servers and sends one connection to each server in turn. Each time it reaches the end of the list, it starts at the top of the list.
The problem is that round robin method of distribtion is not load aware—the load balancer has no way of knowing if the next server in line is actually the optimum choice. This results in either server sub-utilization or overburdening.
What’s more, DNS records have no native failure detection. This means that requests can be directed to the next server in the list even if that server is not responsive.
Lastly, DNS-based load balancing is susceptible to delays. This is because for a DNS change to take effect, it has to be recorded on the ISP level. ISPs only refresh their DNS cache once every TTL (Time to Live) cycle. This means that until the cache is updated, ISPs are unaware that the change took place and continue to route traffic to the wrong server.
Workarounds for this issue—such as setting TTL to a low value—have been developed, but these can negatively impact performance, and still not guarantee that all users are correctly directed.
TTL reliance also means that changes to load distribution directives are propagated much slower, resulting in low response, delayed failover and delayed load re-distribution.
Moreover, changes are propagated unevenly, since each ISP has its own DNS cache rules in place.
By way of example, consider the following scenario:
- A given application runs on two servers, Server A and Server B. The TTL cycle is set at one hour (a common value).
- At any given moment, the load on Server A can increase such that the DNS load balancer needs to start routing traffic to Server B.
- However, what happens if the load spike hits when the TTL cycle is still 20 minutes from update? In this common case, it would take 20 minutes for the load to be redistributed.
- In the meantime, all traffic would still routed to the overloaded server, causing progressively greater delays, slowing delivery and causing service degradation—even to the point of service failure.
- Even after 20 minutes, some ISPs will retain a DNS cache. This means that an unpredictable percentage of traffic will still be wrongly routed, causing uneven performance.
DNS Failover – negatively impacts RTO
DNS-based Failover solutions suffer from the same limitations as DNS-based Load Balancing solutions. However, in disaster recovery scenarios, the effects of the inherent latency of DNS-based solutions are even more severe—markedly lengthening RTO (Recovery Time Objective, or the amount of time a business can function without the system’s availability).
In addition, DNS Failover solutions are also not load-aware. This means that partial or uneven recovery – which can be worse than complete continued failure—is not uncommon.
Such occurrence would be similar to the above mentioned scenario, wherein traffic continues to incorrectly routed long after actual recovery, artificially lengthening service disruptions, and also unevenly distributing some users to the dead server and some to the recovered application version.
Notably, in the case of mission-critical applications (e.g., stock trading), any situation where some users can trade because they are routed to the correct server, and others cannot, is completely unacceptable.