WP The Long and Short of TTL – Understanding DNS Redundancy and the Dyn DDoS Attack | Imperva

The Long and Short of TTL – Understanding DNS Redundancy and the Dyn DDoS Attack

The Long and Short of TTL – Understanding DNS Redundancy and the Dyn DDoS Attack

Our recent post about the implications of DNS redundancy during the Dyn DDoS assault generated some interesting questions from readers about time to live (TTL) DNS settings and their impact on website availability in the event of an attack.

Specifically, readers want to know which TTL settings work best when an authoritative DNS service is taken down by DDoS offenders.

In answering, let’s take a closer look at the role TTL plays in DNS server management. In particular, what are the considerations behind using shorter or longer TTL settings?

Understanding Time to Live

Domain name system (DNS) is a service that translates easy-to-remember domain names (e.g., www.example.com) into numerical IP addresses used by the internet to locate and identify computer services and devices. Such translation occurs every time you type a website name in your browser.

DNS is a hierarchical system comprised of two server types: one is authoritative, while the other is recursive. The authoritative DNS server is the primary source of domain information; it’s where IP addresses are defined and subsequently updated by a domain owner.

Acting as a middleman, recursive servers periodically fetch domain information from their authoritative brethren and pass it to end users. To speed up response times, recursive servers cache the domain information so they don’t have need to request the same data over and over again. This conserves bandwidth and computing resources while also providing quicker user responses.

TTL and DNS

This brings us to time to live (TTL), which is the value that determines how often the DNS cache is refreshed. As we’ll see, this setting also had a part to play in the Dyn DDoS assault.

Shorter vs. Longer TTLs

So what are the considerations for setting TTLs for a website? To a large extent, it depends on the use case and update frequency.

Short TTLs are useful for domains that frequently change their records. One of the most common use cases is where domains rely on DNS-based load balancing and failover services.

Here, as soon as traffic needs to be rerouted to a new server, the IP address is changed on the authoritative DNS. The rate at which the change is propagated is determined by its TTL setting. A short TTL helps update the system more quickly, making the load balancer more effective.

Similarly, when moving a domain to a new server, short TTLs direct users to the new IP as soon as possible.

The downside of short TTLs is that they result in frequent lookups, increasing the cost to the recursive server providers. To manage with this overhead, ISPs set their own rules for minimum allowable DNS refresh rates.

Longer TTLs are mostly appropriate for sites hosted on a single server that don’t frequently change their IP configurations. Longer cache times equate to fewer lookups, lower costs and better performance. A delayed response to any DNS change is one downside, however.

Given the continual increase in internet speeds and the diminishing cost of communications today, short TTL benefits outweigh their disadvantages. This generally makes them the preferred option. As a result, it’s not unusual for ISPs to refresh their DNS cache as often as every 30 seconds.

So How Did TTLs Impact the Dyn Attack?

When Dyn’s targeted, authoritative DNS servers became unavailable, the attack traveled across the world at TTL speed.

That is, as soon as a recursive server attempted to refresh its cache and discovered that Dyn wasn’t available, it had no supply of IP addresses to provide.

Since most of the websites in question relied on short TTLs, the impact was almost immediate.

But what would have happened if the same sites used a longer TTL – say, one hour? In that event, the impact of Dyn outage would have been delayed until the next DNS cache refresh.

Even with the authoritative servers down, during that period DNS data would still have been served from cache, letting recursive servers continue to point client users in the right direction.

Ironically, the same TTL configuration meant to ensure business continuity backfired and caused near-instant unavailability.

A Final Word to the Wise

The Dyn attack highlights DNS system vulnerability as a single point of failure, making a strong case for the adoption of DNS DDoS protection.

It also showed that, when it comes to TTL settings, the key is in finding the right balance.

Shorter TTLs are generally considered to be best practice, especially if you expect frequent changes to your DNS settings.

However, in the extreme scenario of DNS outage, it’s the longer TTLs that provide your domain with some measure of resilience—offering a chance to respond to the outage, or simply to wait it out.