WP Is DNS Redundancy the Right Answer?

Archive

Is DNS Redundancy the Right Answer?

Is DNS Redundancy the Right Answer?

Last Friday’s attacks on Dyn rendered several high profile targets, including Netflix, Twitter, and Spotify, inaccessible to their users, leaving many organizations feeling vulnerable. In the wake of these attacks, many websites now feel the urge to use multiple DNS providers. However, this may not be a silver bullet.

Adding another DNS provider to your mix will likely reduce the chances and extent of damage your website if one of your DNS providers is attacked,  even though it does have its drawbacks and limitations.

Our analysis of Alexa top 1000 companies that used multiple DNS providers revealed that even these experienced service degradation, and even outages.

What We Saw

Before we discuss reasons why degradation and outages could have happened, let’s see a breakdown of the websites by category.

Before the attack on Dyn these were the stats

  1. Websites that didn’t use Dyn (90%) 
  2. Websites that only used Dyn (5.5%) 
  3. Websites that used Dyn and another DNS provider (4.5%) 

After the attack on Dyn, we saw

  1. Websites that didn’t use Dyn (91%) — these were not impacted
  2. Websites that only used Dyn (4%) — these were impacted across the board
  3. Websites that used Dyn and another DNS provider (5%) — these experienced service disruption to varying degrees: some recovered quickly, while others remained inaccessible for several hours­

pie-chart

Download the complete .csv file

We can see that while 1 percent of all sites dropped Dyn altogether, .5 percent added a second DNS provider following the attack. What really caught our attention was that websites with multiple DNS providers were also affected, and experienced delays, and even outages for sustained periods.

Why Did This Happen?

Normally, when a client tries to access a domain, it can find the address either through a current A-record cached in the browser or with the ISP’s DNS. If the ISP’s A-record for a site is expired, it can ping the authoritative name server (if the NS-record is current) for the A-record, or do a recursive lookup (if the NS-record is expired) and obtain a current A-record.

What happened on Friday was that the authoritative name servers of Twitter, The Guardian, and GitHub, being part of Dyn’s DNS infrastructure, were knocked offline, so all lookups for those domains’ A-records returned empty.

But what about websites that had another DNS provider, like Verisign or Route 53, in addition to Dyn? Why did those sites experience service degradation? Shouldn’t the DNS lookup have found the available authoritative server and returned with the A-record? This is where things get murky.

Resolving Multiple DNS Servers

To understand why the DNS switchover, in many cases, is neither quick nor clean, we have to look at how DNS resolvers behave. According to the guidelines on the selection and operation of secondary DNS servers, published by the Internet Engineering Task Force (IETF),

“First, the only way the resolvers can determine that [certain] addresses are, in fact, unreachable, is to try them.  They then need to wait on a lack of response timeout (or occasionally an ICMP error response) to know that the address cannot be used.  Further, even that is generally indistinguishable from a simple packet loss, so the sequence must be repeated, several times, to give any real evidence of an unreachable server. All of this probing and timeout may take sufficiently long that the original client program or user will decide that no answer is available, leading to an apparent failure of the zone.  Additionally, the whole thing needs to be repeated from time to time to distinguish a permanently unreachable server from a temporarily unreachable one.

And finally, all these steps will potentially need to be done by resolvers all over the network.  This will increase the traffic, and… effectively lower the reliability of the service.”

Basically, in those cases where websites had multiple NS records, the resolution between them was impeded by the very way that DNS resolvers work. The resolver could have tried to reach the inactive Dyn server so long as the resolver’s cached NS-record was current, which would be true until the TTL of the NS-record expires.

It would be akin to having the right address to a house, getting there, finding no one at home, and instead of trying the alternate address provided by the homeowners, going back to the empty home several times just to make sure they hadn’t returned.

No Silver Bullet

In the longer term, with the continued increase in the scale of DDoS attacks, and especially the unhindered proliferation insecure IoT devices, the threat of an attack on DNS providers will likely get more serious. Having multiple DNS providers may be the only way at this point. At least, it reduces the chances of the kind of blackout that can occur if a sole DNS provider gets hit with a DDoS attack.

What our observation highlighted was that while it is appealing and even effective for some organizations to use multiple DNS services in order to avoid an outage, it still is no silver bullet. Having multiple, redundant DNS providers can still lead to service degradation and even timeouts.

For those organizations that are willing to live with the added cost, complexity, and compromise that comes with multiple DNS providers, it is at best a partial solution.