Load Balancers: Abstraction, Service Discovery, and High Availability

Q: What is "Load Balancers: Abstraction, Service Discovery, and High Availability" about?

How a Load Balancer makes 500 servers look like one, how it tracks which servers are alive, and what happens when the Load Balancer itself becomes a single point of failure.

Q: What topics does "Load Balancers: Abstraction, Service Discovery, and High Availability" cover?

This article covers: load balancer, service discovery, heartbeat, geodns, high availability, nginx.

Once you have 500 laptops in a room, you face an immediate problem: which IP address do you give users? You cannot ask them to pick from 500 options. You need a single entry point that hides the complexity behind it. That entry point is the Load Balancer.

What is a Load Balancer?

A Load Balancer (LB) is a server or software layer - like Nginx or HAProxy - that sits between the user and your server fleet.

The Abstraction Principle

The primary job of an LB is abstraction. The user does not care that you have 500 laptops. They just want to visit your site.

The DNS registers only the IP of the Load Balancer.
The LB receives every incoming request and decides which backend server should handle it.
It hides your entire internal architecture from the outside world.

ExpandLoad Balancer Architecture: The Unified Gateway

Why an LB Can Handle Far More Than an App Server

An app server running Django or Spring does heavy work: deserializing requests, authenticating users, querying databases, running business logic. This limits a typical server to around 1,000 requests per second (RPS).

An LB does none of that. Its only job is to read an incoming packet and forward it. Because the work is so thin, a single LB can handle 100,000+ RPS - meaning one LB can comfortably serve the entire fleet behind it.

Service Discovery: Who Is Alive?

If an LB is going to forward requests, it needs to know which servers are actually running. This is called Service Discovery.

The Heartbeat (Push)

Think of a heart monitor in an ICU. As long as it beeps, the patient is alive. In a distributed system, every server periodically pings the LB:

The Signal: "I am alive."
The Timeout: If the LB does not hear from a server for 15 seconds, it assumes that server crashed.
The Action: The LB immediately stops routing traffic there. Users see no errors.

Heartbeats are a Push mechanism - the servers initiate contact. The alternative is a Health Check, a Pull mechanism where the LB proactively pings each server. In modern systems, health checks are more common because they keep the LB in control.

Solving the LB's Own Bottleneck

The load balancer is now the single entry point for all traffic. That raises an obvious question: hasn't the LB itself become our new Single Point of Failure (SPOF)?

Multiple Load Balancers via DNS

We cannot put a second LB in front of the first - that just moves the problem back one step. Instead, we use DNS to point to multiple LBs simultaneously.

In reality, a DNS record can return multiple IP addresses for a single domain name. When a user queries delicious.com, the DNS can return the IP of either LB-1 or LB-2. If one fails, traffic shifts to the other.

ExpandHigh Availability with Multiple Load Balancers

GeoDNS: Route by Location

When a user in India queries your site, the DNS does not return a random IP. GeoDNS looks at the user's location via their IP address and returns the IP of the physically closest Load Balancer - for example, one hosted in Mumbai.

If that Mumbai LB goes down, the system replaces it at the same IP address. We avoid updating the DNS record itself during a crash because DNS changes propagate slowly across the globe - sometimes taking hours to reach every resolver.

[!TIP] This is why health-check failover happens at the infrastructure level (replacing the server at the same IP), not at the DNS level. DNS propagation is too slow for crash recovery.

We now have a resilient traffic layer. The next challenge is harder: what happens to the data when requests land on different servers at random?

The Essentials

A Load Balancer solves the entry-point problem by acting as a single public IP that abstracts your entire server fleet - users talk to one address, the LB decides which of the 500 machines handles each request.
Service Discovery (via heartbeat push or health-check pull) is how the LB knows which servers are alive - if a server goes silent for 15 seconds, the LB stops sending it traffic automatically.
The LB itself is a potential SPOF, solved by registering multiple LBs in DNS and using GeoDNS to route users to the closest one - failover happens at the IP level, not the DNS level, because DNS propagation is too slow for crash recovery.