WP PoP Architecture: Swiss Army Knife vs. the Tool Belt Approach | Imperva

Archive

PoP Architecture: Swiss Army Knife vs. the Tool Belt Approach

PoP Architecture: Swiss Army Knife vs. the Tool Belt Approach

The Incapsula PoP (point of presence) is a good example of a north-south data center design: incoming traffic from the internet is scrubbed by various layer 3/4 techniques and forwarded to the website protection service for securing layer 7 content. Once secured, traffic is eventually sent to the origin server.

Architecting a PoP is a complex task, with scale and cost being only some of the dominant factors. In this blog post, I will share some design options and highlight the ones that we’ve decided to adopt for Incapsula.

Incapsula Services

The Incapsula SaaS provides several complementing services, all of which are aimed at enhancing the usability and security of our customer’s services and networks.

The following section discusses load patterns for some of the components involved in making this happen. We’ll discuss the general hardware requirements and propose some customized hardware to really make it shine.

Traffic Filtering Service

Filtering is aimed at collecting the low hanging fruit, such as dropping packets that aren’t supposed to be there in the first place.

Blocking out port scanners is a classic use case for traffic filtering. Port scanners will send traffic to various ports to identify the service behind it. Once a service is identified, the attacker will attempt to find vulnerabilities in the service, enabling infiltration into the system. This risk can be easily dealt with by crafting a filtering policy to allow inbound traffic on specific ports and dropping all other traffic.

Let’s consider a 5-tuple (source/destination IP, source/destination port, L4 protocol type) based ACL (access control list) as the tool for filtering out unwanted traffic. Implementation could be software- or hardware-based.

Software

Software ACL implementations are abundant, with iptables being the most well known and feature rich. However, a pure software approach severely limits the PPS (packets per second) performance of such a service.

Hardware

Hardware ACL implementations are available on most basic switching silicon. ACLs are implemented using TCAM memory. For those of you unfamiliar with the term TCAM, no sweat. It’s a simple filtering term for matching binary content, where each bit in the filter can be either zero, one or a “don’t care” condition. TCAM memory has very high performance but is very small compared to modern CPU memory.

A pure hardware approach will not be able to accommodate a large number of ACL rules required for scale, while a pure software approach will not be able to provide the performance required to handle DDoS attacks. A hybrid approach is the best bet, utilizing TCAM to filter out the bulk of the traffic, while handling the long tail in software.

  • Basic requirements – Strong CPU for intensive packet processing, backed by significant RAM and a large cache
  • Turbo charged – A simple silicon TCAM-based ACL implementation will provide wire speed processing power for the bulk of the traffic

Attack Mitigation Service

The term “attack mitigation” is very commonly confused with traffic filtering and sometimes obfuscated by the general term “scrubbing.” The role of the mitigation service is to perform more complex traffic analysis and detect possible signs of malicious traffic. Commonly used techniques include signature detection, anomaly detection and deep inspection of some layer 7 protocols.

The hardware/software question persists. Since mitigation requires much more than a simplistic 5-tuple inspection, ACL hardware acceleration is insufficient. On the other hand, a pure software packet processing implementation will not be able to reach very high (and rising) PPS attacks.

A DPI (deep packet inspection) capable ASIC can perform some of the mitigation techniques at wire speed. Such ASICs will typically be a high end switching silicon or a network processor.

  • Basic requirements – Strong CPU for intensive packet processing, backed by significant RAM and a large cache
  • Turbo charged – Throw in a DPI-capable networking silicon to contain very high PPS attacks

Website Protection Service

The website protection service is a layer 7 service, which terminates a client connection and protects against threats such as SQL injection, cross-site scripting, illegal resource access, layer 7 DDoS and remote file inclusion.

  • Basic requirements – A strong CPU is required to run the protection engines on thousands of requests going through the service at any given time. Content caching is performed both on disk and in memory, therefore large RAM and cache are required as well as very large and fast disks (typically SSDs)
  • Turbo charged – Possible enhancements would include TCP segmentation offloading/full TCP offload, as well as SSL hardware acceleration

Putting It All Together

Now that we’ve established the components involved, the question arises as to how the PoP should be architected to accommodate these services. Let’s consider two approaches:

Single Server for Multiple Functions: the Swiss Army Knife Approach

This design takes horizontal scalability to the limit. We assume that all servers can run all the services. A schematic setup would look something like:

In this architecture scaling up is a no brainer. If you need more firepower just throw in a few more servers. Simple! There’s also the added value of simplified DevOps — no need to maintain different servers, no need to manage the complexity of deploying multiple software components.

But, wait … when describing the various services didn’t we agree that different services have different hardware requirements? What kind of server should we build to accommodate all these services running in parallel? Should it have the PPS power of a filtering/mitigation service? Or perhaps the RAM/SSD volume required by CDN services?

This is where this model runs into difficulties: Services scale differently, if more DDoS mitigation power is required, you would not want to spend your money on SSDs, just for the sake of uniformity. Conflicting hardware requirements will ultimately lead to compromise, which will either impact the service or become very expensive to scale. Infrastructure as-a-service (IaaS) providers avoid this anti-pattern by providing different instance flavors (check out Amazon EC2 Instance Types).

Matching the Server to the Service: the Tool Belt Approach

This design basically acknowledges that there are different tools for different jobs. The dedicated hardware performs a single service, but is much better at it. The different services form a pipeline. A typical setup would look something like:

In this design, scale-up is performed “per service”. Just drop in whatever machine best suits the required service. If layer 3/4 attacks get worse, just drop in another filtering machine. There’s no need to invest in hardware suitable for other services and no need to compromise on the machine specification.

Besides overcoming scale issues, this design also leads to other advantages:

  • CPU performance improves when different applications do not share the same memory cache
  • Service decoupling greatly enhances system stability, so that a failed or buggy mitigation machine will not impact website protection functionality
  • Introducing new services is not likely to impact existing services, which boosts agility, productivity and innovation

But … this approach isn’t for everyone. Rolling out a new software version for many components requires some serious DevOps. Plus, it’s very difficult to do if you’re not the technology owner of all the components in the pipeline.

Having Deja-vu?

There’s a good reason for that. What we’ve described is basically one aspect of the microservices architectural pattern, with the “unified server” approach being the monolith, which is then broken down into microservices in the pipelined approach.

The monolith describes an application performing several different functions; this architecture is useful in some circumstances, as adding a new use case is very easy to do. Just add new code to the existing application. It’s simple to deploy, and (if there are no conflicting resource requirements) it’s fairly easy to scale.

A bloated monolith is a whole different story. It typically becomes too large for one person to understand and manage, impacting agility and productivity. The operational difficulties described above will impact performance, stability and scale. Hence, we believe that the “unified server/monolith” approach is a pattern to avoid.

The arguments for a specific implementation may be influenced by many factors, which range from legacy components, ownership of technology and operational complexity.

At Incapsula, all components are developed in house, and we make every effort not to adopt any organizational or architectural pattern that will slow us down. We are constantly striving to walk down this sometimes difficult, but rewarding, path.