Imperva Incapsula (now Imperva Cloud Application Security) is a cloud-based application delivery service that provides protection for websites and increases global website performance, safeguarding web applications and data from attacks. The Incapsula service uses proprietary technologies that give us the flexibility to make changes and adapt our service according to new trends and needs.
Until recently, our data processing happened partially in the PoP the traffic passed through, and partially in a centralized US data center. With growing attention to data privacy and data regulations like GDPR and new data localization laws, we revisited how we handle data at rest and the control our customers would need to meet ongoing data privacy demands.
The result is that our customers now have the option of regional data storage, providing regional isolation control at site level settings. Logs can be isolated per region and per site to facilitate compliance with global data localization and data privacy requirements. Customers can now answer the question of “where is my data” confidently, since they can see and change the assigned region with the click of a button; a configuration that is also available through the APIs to those who prefer a fully automated approach.
So, let’s get into the architecture changes our team made to the Incapsula service, allowing us to address the growing need around data privacy by enabling the new self-service regional isolation control feature.
Incapsula handles terabytes of compressed logs files per day and; since our network is globally distributed (40 PoPs), data handling poses a challenge locally, and a much bigger one when processed and transferred to other locations later in the data flow process.
Figure 1 illustrates the data flow in our system from creation within the WAF proxy through provision to and consumption by a customer account admin or analyst looking at the data either in the management UI or via SIEM integration.
When we started collecting customer feedback around data localization and data privacy requirements and looked at the different regulations, we quickly noticed a change was needed. Our development team went to work on a new architecture, one that supports geo-fencing—the ability to isolate data in multiple regions.
Before and After
Previously, our data collection system comprised two setups. The first consisted of four machines, centralized in a US-based PoP. It was responsible for the information presented in the management console (on the Events page), and the threat alert notifications and processed and stored data from our entire network.
The second setup consisted of 30 machines, close to one in each PoP. It was responsible for generating our SIEM integration web logs. Each machine was responsible for data originated in its hosting PoP.
Each proxy in our network generates a file every minute, containing request data in Protobuf format. An internal distribution service is responsible for sending files to the relevant machines in each setup, creating a flow that separates the data sent from the real-time service of the proxy, making sure it will not affect request processing.
From a data localization perspective, the historical limitation with the first setup is clear—data from all regions is stored in the US. The problem with the second setup is less obvious and more complicated. Customers want to store data in the region their site is located, not the one from which the traffic originated.
The new architecture (Figure 3) provides more flexibility in terms of performance, load and data privacy regulation and compliance.
By default, we assign a region to each protected site (EU, US or APAC) and logs collected for that site are automatically assigned to their regional PoP based on the geo-location of the origin server registered behind the given site. In some cases, the admin may wish to override the default region assignment with a manual selection. They can do so by selecting a different region from the Site settings -> General tab -> Data Storage drop-down list (see Figure 4).
Once the change has been applied, logs will arrive at the new region.
The change in the central storage setup was relatively easy. Instead of one cluster, we now have a cluster per region, each proxy is aware of all three clusters and sends relevant data to each.
The change in the PoP’s storage, however, was a bit more challenging. Before the change, each proxy only had to send data to its local PoP’s storage. Now, each proxy must be aware of three different PoP’s storage—one per region; which means maintaining a mapping between each proxy to the relevant storage machines in each region. This complicates our configurations and files sending service, but also helps us better control the load balancing between machines.
Another hurdle we overcame was PoP maintenance. In previous cases, when a PoP was under maintenance, both the PoP’s storage and the PoP’s proxies were under maintenance together. That meant that there was no data to collect in that storage during the maintenance. With the new flow, each PoP’s storage is also responsible for processing data from proxies in other regions, which are likely not under maintenance at that time.
To avoid any impact on processing times, we wanted to divert the data to a standby storage in the same region, pushing us to incorporate two new abilities. The first is to publish a configuration change globally with minimal effort, and the second is to transfer the cache of live sessions from one storage center to another to avoid data loss in the process.
The Result: Regional Isolation Control
The new regional data storage option is a significant first step on our path to offering more capabilities around data privacy.
Next, we plan to introduce new regions in order to provide more granular control to customers based on emerging regulations and market dynamics globally.
Regional isolation is a strong focus area for Imperva, and we’re currently working on additional features to help our customers gain better control with other self-managed features.