WP Who Says Behemoths Can’t Dance? | Agile 170Gbps DDoS Mitigation Appliance

Archive

Who Says Behemoths Can’t Dance? Building an Agile 170Gbps DDoS Mitigation Appliance

Who Says Behemoths Can’t Dance? Building an Agile 170Gbps DDoS Mitigation Appliance

Update (3.28.2016)

Behemoth scrubbers are now deployed in 28 Incapsula data centers:

EU

  • Amsterdam
  • Frankfurt
  • London
  • Madrid
  • Milan
  • Paris
  • Stockholm
  • Tel Aviv
  • Warsaw
  • Zurich
Americas

  • Atlanta
  • Ashburn
  • Chicago
  • Dallas
  • Los Angeles
  • Miami
  • New York
  • San Jose
  • São Paulo
  • Seattle
  • Toronto
APAC

  • Auckland
  • Hong Kong
  • Melbourne
  • Osaka
  • Singapore
  • Sydney
  • Tokyo

Update (3.10.2015)

Behemoth scrubbers are now deployed in 13 Incapsula data centers:

EU

  • Amsterdam
  • Frankfurt
  • London
  • Paris
  • Stockholm
  • Zurich
US

  • Chicago
  • Dallas
  • Los Angeles
  • Miami
  • New York
  • San Jose
APAC

  • Tokyo

Update (7.7.2014)

Behemoths are deployed in five Incapsula datacenters:

  • Los Angeles
  • San Jose
  • London
  • Frankfurt
  • Miami

So far, the largest attacks mitigated by a single machine were:

  • By Packet Rate: 67 Million packets per second , mitigated by Behemoth server in LA.
  • By Bandwidth: 60 Gigabits per second, also mitigated by Behemoth server in LA.

As a provider of DDoS mitigation services, we frequently see attacks exceeding 50 million packets per second and consuming 100 Gigabits of bandwidth per second — attacks which we would have considered monstrous only two years ago.

Towards a Terabit Scale Attack

The increase in intensity and scale of DDoS attacks is fueled by the inevitable growth of the Internet itself. Every day, more and more servers, with access to plenty of CPU and bandwidth, are coming online. However, as a whole, their security is not improving. As a result, there are an increasing number of resources available to be hijacked and exploited for DDoS attacks.

Contributing to the security issue are inherent limitations in the BGP routing protocol, which is the underlying plumbing of the Internet. BGP is limited to directing traffic according to destination IPs, and is completely agnostic of content, source or protocol.

The protocol suits ISPs, who need a simple solution that allows them to operate scalably, and that can be implemented using relatively limited, but fast, ASICs. But this makes en route filtering difficult, bringing the problem right to the doorstep of the targeted parties.

Data Plane and Control Plane

The inherent limits current routing protocols have brought about a rethinking of the way the network handles routing decisions. Traditionally routers along the Internet’s backbone have performed two distinct functions:

  1. Working with their peers to build a routing decision table (a.k.a. the “Control Plane“)
  2. Responsibility for the actual routing of packets (a.k.a. the “Data Plane”, or “Data Forwarding Plane“)

Today, there is a lot of work being done to separate the Data Plane from the Control Plane, and to make the Data Plane more dynamic by allowing it to identify “flows”. These flows are based on information about source and destination ports, source and destination IPs or subnets, and protocols being used.

The practice of flow identification enables granular decision-making on the Data Plane, using technologies like Openflow or FlowSpec, to actually achieve a generic (and flow aware) Data Plane that can handle large packet loads.

From a DDoS mitigation point-of-view, the ability to make flow-related decisions is a huge improvement, but it’s still not enough. To ensure a low level of false positives, there is no alternative but to do actual protocol analysis, including handling streams with packet modification and generation (think SYN cookies, DNS protocol content, and TCP segmentation).

Programming the Data Plane

Traditionally, working on the data plane (i.e., handling forwarding of packets) was not a job for off-the-shelf server hardware. Indeed, our own experience with DDoS mitigation kernel modules showed that Linux can only handle up to about 3 million packets per second per machine, which wasn’t cost effective.

Scalable packet processing is usually done using custom ASICs, FPGAs or Network Processors. These tools can certainly handle large packet rates, but we were reluctant to trade our dynamic programming environment for what inevitably is a slower, more cumbersome development process. We wanted the comfort of off-the-shelf server hardware, but without its limitations.

It turns out that the bottleneck is not the machine, but the Linux kernel itself. And so, using libraries such as Intel’s DPDK, PF_RING/DNA or Netmap, which bypass the kernel completely, it is possible to process raw packets at line rate.

So we built the machine we call “Behemoth” using off-the-shelf servers. In essence, it’s a dual socket Intel E5-2690v2 machine, which provides us with 20 cores. On top of that, our friends at Silicom provided us with cards with sufficient port density to put in 18 10g interfaces. The software is a massively parallel userland application, which processes raw packets at the Data Plane (e.g., mitigation and tunneling) and also handles Control Plane (e.g., routing and traffic diversion) activities.

Behemoth Sampling and Filtering Processes

The capacity of each Behemoth machine is 170Gbps, with an ability to process up to 100 million packets per second. To put this in perspective, this means that a single Behemoth machine is sufficient to mitigate the largest attacks we have seen to date. But this giant machine is not just big, it’s agile.

It’s All About Deployment Speed

Behemoth sits next to our edge switches, which aggregate all incoming traffic from multiple providers. It inspects traffic on the edge switches, rerouting relevant traffic so it can be processed inline. Its internal BGP speaker communicates with both our routing infrastructure and with that of external providers, to affect system wide decisions.

Behemoth is fully operational in three Incapsula PoPs, and we expect to implement it on six more over the next few weeks. It is the underlying technology behind our recent feature releases, and is already mitigating large scale attacks with unprecedented accuracy.

For us, its most important aspect is not its capacity, but the fact that it is just a userland application, which is incorporated into our existing development and release cycle.

Enter the Behemoth

One of our active Behemoth scrubbers

The combination of a service environment, where we get visibility into attacks as they happen, and a programmable environment at the place where we traditionally had very little control (the data forwarding plane), is tremendously important. It allows us to evolve our DDoS protection at DevOps speed, giving us the ability to stop massive current as well as future attacks.