At Imperva Incapsula, we process vast amounts of attack data every day. Collected by hundreds of proxy servers throughout the world in over 30 data centers, the data is invaluable both to our customers as well as to a number of internal Incapsula teams.
Most significantly, it provides you with relevant details about your attackers, such as IP addresses and user agents, so that you can enact preventative measures against them. Additionally, the data is used by both our security and support teams to better understand the online threat landscape and to assist other customers who may come under attack.
Because of the importance of this data in relation to analysis—essential in responding to any kind of web attack—we recently revamped our systems to better provide it to our customers and internal teams.
What Constitutes Attack Data?
Attack data informs you of everything you need to know while your site is being assaulted. Seen below, the source IP, user agent, attack type, country of origin and URL of the perpetrators are provided so you can take the appropriate mitigation steps.
Until recently, Incapsula’s data processing was done by a single central system (code named “Graceland”), which used an old-fashioned replication solution and had limited scaling capabilities.
Graceland processed and stored data received from our proxy servers, which was then consumed by our management console and offered to our customers, security research team, et al.
That system was effective, but new capability requirements—including SIEM (Security Information and Event Management) system integration, as well as an expanding customer base and increased site traffic—meant we had to reinvent our data infrastructure architecture.
In doing this, Incapsula developed a state-of-the-art system to provide customers with the best solution for cyber-attack logs, in addition to SIEM integration. We aimed to build a scalable and fault tolerant system to collect attack logs in real-time, without ever losing any data.
Three areas were identified for improvement:
- Data processing had to scale out from one to 30 systems.
- Data collection had to be fully synchronized between the systems.
- Data logs had to be industry compliant.
Scaling Out Data Processing
Incapsula’s first step was to scale out data processing. Graceland was installed in each of our PoPs to collect, process and store its own proxy data. This globally-distributed processing load then enabled us to branch out independently within each PoP.
In addition, we reinvented our transfer mechanism so that data could be sent to different locations. Most of the time, the location would be each PoP’s Graceland instance. But since a fault tolerant solution is part of the new architecture, the ability to dynamically transfer data had to be implemented in case any Graceland instance stopped functioning.
Within a month, we successfully deployed over 30 worldwide systems that collect, process and store attack data.
Next, we had to address how to synchronize the collected data.
Data Synchronization: Introducing Lumberjack
Lumberjack, a new data processing system Incapsula developed, simultaneously collects attack data from the multiple Graceland instances and generates standard log files in CEF or W3C formats.
When developing Lumberjack, we were certain to include the following features:
- It had to be 100% fault tolerant
- It could never lose a single event
- It had to simultaneously collect data from 30+ systems
- It had to scale to handle any load increases
An Apache ZooKeeper cluster was deployed to manage the distributed synchronization. We chose this system because it is a fast, reliable and fairly simple framework, while still being an industry standard and supported by an active community.
ZooKeeper is used for:
- Work distribution: To ensure complete fault tolerance, a ZooKeeper master/worker pattern manages log collecting tasks between different Lumberjack systems. At any given time, one Lumberjack system acts as a master, assigning log collecting tasks to the other worker systems. Should a master system goes down, another one takes its place.
- Log collecting synchronization: To guarantee that no events are lost and that data is simultaneously collected from all systems, ZooKeeper synchronizes all relevant information.
After Lumberjack reads the data and processes it in a log file, the system is updated. For each customer, ZooKeeper has a map of every PoP’s Graceland, containing the relevant file index and offset coordinates. In this way, the next Lumberjack receiving a customer task starts reading the data from the correct point.
- Scalability: Both work distribution and log collection synchronization ensure that the Lumberjack architecture can scale to any load increase.
To secure data and guarantee that only customers can view their own log files, Incapsula uses hybrid encryption. This involves generating a symmetric key for each log file and using it to encrypt all content. This key is then encrypted with the customer public key and is placed in the log file header.
Once a Lumberjack finishes processing and writing a log file, it compresses and encrypts it. It is then uploaded to a central storage repository.
Providing Data Logs to Our Customers
As a final implementation step, Incapsula needed to ensure that the collected data logs are industry compliant and readily available to you, our customer.
Previously, you would have had to make an API call via our management interface. You would then get a JSON-formatted response with partial events data.
To comply with SIEM, we’ve enabled you to directly download log files in CEF or W3C format over HTTPS, provided you have the proper authentication.
To enable community use of our data storage, we installed a nginx server on every Lumberjack system, dynamically configured with each customer’s relevant credentials. Each ngnix instance acts as an intermediary: It authenticates every log file request and, if valid, retrieves the log file from storage and sends it to the customer.
In order to share our data with multiple API clients, mass amounts of workloads are distributed between multiple endpoints. Here the Incapsula load balancer distributes requested loads between Lumberjack systems.
Incapsula’s architecture team took a huge leap forward by designing and building a state-of-the-art, fully-synchronized and highly-available solution. Our new generation of data infrastructure is already successfully handling all of the attack data pouring through our network.
We’re not done, however. Incapsula is already designing and implementing the next set of infrastructure updates to ensure that you continue receiving the best possible service.