When is a load balancer not really a load balancer?
When it’s an AWS Elastic Load Balancer (Classic ELB or ALB.)
Although it’s been well-documented for at least 6 years, it’s still not well-known that one side of a Classic ELB (or ALB) does load balancing, but the other side doesn’t have a VIP, or static IP address, which most users expect.
The reasons AWS decided on no static VIP are:
- AWS is likely using round-robin DNS to implement ELBs under the hood, although with options like “sticky” and health-checking
- AWS doesn’t want users to rely on permanent IP addresses for “AWS internal network management reasons”
- AWS didn’t care about end-user expectations, otherwise they would have called it a “HLB” (Half Load Balancer)
- AWS can tweak their own services to consume the output of ELB’s, like endpoints.
The problems with not having a static VIP are:
- client programs (browsers, Java applications, k8s, etc.) that connect to an ELB will have apparent random connect failures and have to resolve the endpoint per connection request periodically when the ELB changes the addresses
- client programs cannot cache DNS if they want reliable connections, a significant performance problem
- ELB addresses cannot be whitelisted in firewalls
- ELBs typically return several IP addresses, similar to round-robin DNS, which some applications don’t expect
- in network engineering terms, ELBs don’t support TCP Layer 3, which is immensely unhelpful
- without a VIP, designing static architectures is fruitless – how can you guarantee 5×9’s when devices are changing without any advance notification? ie. you’re “painted into a corner”
- if traffic ramps up quickly, the ELB topology will scale by returning a varying list of IP addresses in a short amount of time
- connections are broken by ELB, introducing corruption into stateful applications like middleware and orchestration software.
Additional problems with ELBs are:
- even when expecting multiple IPs, ELBs randomize them, causing cache misses. You must also have a distributed session manager if you don’t use “sticky.”
- the default is to break the connection on change, not to drain the connection first.
By now, you’re likely horrified as you see your 5×9’s rapidly disappear in the rear-view mirror. 🙂
For most users, the lack of that static IP disqualifies Classic ELBs and ALBs from any HA architecture design.
Solutions if your client program is expecting a static VIP are:
- Network Load Balancers (NLB’s) (introduced in 2017) support an EIP
- use EIPs (note that replacing a server in your farm will require binding an EIP in some cases, which may take up to 120 seconds. So you really should start with n+1 servers at all times.)
- EIPs to HAProxy
- third-party solutions like F5.
- AWS has an article on combining NLB+ALB+Lambda to track IP address changes
- manually monitor ELB changes and restart your apps, preferably automatically
- tell your browser users to retry failed requests, or close their browser and reopen it, whenever they see errors.
- Route53 DNS ALIAS and CNAMEs only help if your client program doesn’t cache lookups. I don’t know why AWS documentation erroneously says that ALIAS will somehow help with ELB’s, as client programs cache lookups.