When our first custom hardware affectionately nicknamed Behemoth was introduced in 2014, Incapsula throughput to mitigate DDoS attacks increased to 170 Gigabits per second (Gbps). That platform soon mitigated a DDoS attack which was considered huge at the time, and has continued to mitigate growing attack volumes. With the growth of the internet and the number of connected devices, the scale of attacks has constantly increased. After accomplishing our initial goals in Behemoth, our engineers got working to further improve the platform. The result of their efforts is the recent introduction of Behemoth 2.
The platform was designed with the following goals in mind:
- Performance: to keep up with the rising attack volumes, we needed to increase our scrubbing capacity.
- Time to mitigation: perform detection and mitigation at a sub-second resolution.
- Scale: eliminate bottlenecks which presented scalability issues.
The Wisdom of Hindsight
In the original design shown below, we see that traffic is constantly monitored via the sampling interface. Once a DDoS event has been identified, Behemoth will request the traffic suspected to be malicious to be diverted for mitigation via the CLI-based (command line interface) control path. The switch will then divert the suspected traffic for mitigation based on the control instructions it receives.
To understand the improvements, we must understand the limitations of the original Behemoth. In revisiting the schematic, we were able to identify some weaknesses in the design:
- Flexibility: A divert request is basically a TCAM (ternary content-addressable memory) entry matching a specific traffic pattern coupled with a redirect action. If several attacks are detected at the same time, more of these TCAM entries get occupied, but typical switching fabric has its limitations:
- Switch TCAM has limited capacity for rules
- Limited options are available for advanced traffic matching
- Time to mitigation: Since the switch and Behemoth are on separate platforms, messages are passed between the systems over the SSH protocol. CLI/SSH is a textual protocol, designed for human interaction and not machine-to-machine communication, which impacts usability and speed. Since we aim to provide a near-zero time to mitigation latency, this was unacceptable.
- Scale: Our ability to scale out was limited as adding new appliances would mean two separate switch masters, where each appliance would take possibly conflicting decisions as to which traffic should be scrubbed.
To resolve the issue, it was clear that each appliance must be able to make decisions independent of the others, but assigning a dedicated switch for each Behemoth platform is not cost effective. It requires hardware, transceivers, cabling, datacenter costs, plus there’s the operational complexity of managing these appliances. We decided this was not the way to go.
- Performance: Behemoth mitigation is executed by a very high-performance user-space application which can achieve up to 170 Gbps when mitigating large packets. When considering small packets, the appliance is capped by packets per second performance. This happens because when more packets go through the system, more mitigation decisions need to be taken.
In light of the growing PPS attacks, we wanted to improve the PPS performance as well as the BW capacity.
Introducing Behemoth 2
Under the hood, Behemoth 2 is a custom-tailored appliance which utilizes the Intel E5-2690 CPU and pairs it up with the Intel Alta: a high-performance, highly programmable switch silicon. The platform supports up to 64*10 GbE Ethernet ports, with 20 internal ports to the host and 44 external connections providing a total capacity of 440 Gbps, and a packet per second capacity of over 650 Mpps. The internal switch has the TCAM capacity for thousands of rules, each of them able to match packets based on layers 2-7.
As shown in the figure, all the traffic entering the PoP is diverted to Behemoth 2, which will then apply internal sampling for DDoS detection. It also controls the Alta switch to divert the traffic for software mitigation. Alternatively, the mitigation service can install rules on the switch to perform the packet drop in hardware.
Meeting Our Design Goals
With the implementation of the Behemoth 2 platform, we were able to make the following changes.
- Time to mitigation: The switching fabric is incorporated in the same hardware platform as the CPU. All control functions are now performed internally over the PCIe interface, via a software SDK which enables fast(!) control. Error handling and recovery are straightforward, and the system in general is much more robust.
- Scale: The external switch is now relieved of the control functions. With this decoupling in place, we no longer have to worry about how several Behemoth appliances will interact with the switch concurrently.
This has allowed us to scale horizontally, and currently each of the Incapsula PoPs houses between two to eight appliances, and counting…
- Flexibility: The Alta switching platform has greatly enhanced the TCAM capacity when compared to off-the-shelf switching platforms. This has let us greatly reduce false positives as we have a sufficient number of rules to divert specific IP addresses (/32) for mitigation rather than subnets (/24). Traffic which is not diverted for mitigation will flow uninterrupted inside the PoP.
- Performance: The switching platform’s TCAM capacity and its ability to perform hardware deep packet inspection of header fields for layers 3-7, has provided the tools to effectively offload to the hardware some of the mitigation logic currently performed in software. Offloading, in addition to increased port connectivity (440 Gbps), has led to a dramatic increase in the performance of each appliance.
Taking a deeper look into the appliance, we can identify the different flows for traffic traversing Behemoth 2.
- Flow A – In the case of clean traffic when the sampling function doesn’t detect any threat, traffic flows in and out of the appliance without any interruption.
- Flow B – When the sampling function detects a potential DDoS event, a divert command is written to the switch over the PCIe interface and traffic is routed to the host CPU for mitigation.
- Flow C – When the sampling function detects a potential DDoS (same as Flow B), but this time the attack is of a very significant PPS volume, instead of just identifying the pattern and sending traffic to the CPU for mitigation, the service will instruct the switch to start mitigation. The application will not offload small attacks to the switch in order to save on TCAM resources. In this situation offloaded rules will dynamically switch between the heavy hitters. This frees the application to focus on mitigating the remaining lower volume traffic (Flow B) and thereby handle a much larger capacity. The CPU will also focus on threats that require more application layer packet analysis or actions such as challenges.
Behemoth 2 in Action
During the recent 650 Gbps DDoS flood, more than 150 Mpps (million packets per second) were received at the Incapsula PoPs.
The attackers started out with ~100Mpps, which had no impact. They then took a few minutes to regroup (11:15-11:21) and came back in full force. The green line depicts offloaded PPS, roughly 100-115 Mpps were offloaded to the switch, leaving the CPU mostly idle.
The full capacity of even a single Behemoth 2 has not been required to fend off a DDoS attack, but it’s good to be ahead of the game. With the capacity and scalability, Incapsula is positioned to protect its customers from DDoS attacks that will continue to grow in scale and sophistication.
And finally, here’s what Behemoth 2 looks like in action!