How Imperva CounterBreach uses machine learning to combat insider threats


I’d like to introduce Lior Fisch, our resident expert on everything machine learning. A few people I know are as knowledgeable as Lior and have so much fun working on big data. He applies his extensive domain expertise in data mining (did I forget to mention he has a Ph.D. in data mining?!) and machine learning into Imperva CounterBreach to help solve the burgeoning insider threat problem. I had a fun discussion with him, where he shares his passion for complex algorithms and how to apply them into everyday life.

Q: What is machine learning?

Machine learning is a process in which a computer becomes an expert in solving a certain type of problem – such as distinguishing malicious hacker activity from legitimate activity – by learning from examples it has digested. The analogy from everyday life is an expert physician who can diagnose a patient’s condition by integrating various bits of information about the patient’s history that would mean little to a non-expert. Notice the emphasis on domain expertise which is required execute a successful application of machine learning. During the learning process, the computer independently creates rules about how to use these bits of information to solve the specific type of problem.

Q: Why is the user behavior space a good candidate for machine learning?

I have nothing useful to say <grin>. Just kidding! I’d like to highlight a few reasons why machine learning is so useful in the user behavior space. First, it brings an important context to security events. Security teams are inundated with alerts, and because users have legitimate access to data, it’s tough to distinguish between what’s normal and abnormal when it comes to user data access. Also, while prevention is an important part of enterprise security, detection is of utmost importance when it comes to identifying insider threats.

The importance of the context

When doing detection, it’s not enough to work at the single event level. An event cannot be normal or abnormal in itself, but only in the context of the normal or baseline behavior. That is where machine learning comes in – it enables us to learn this context.

Inundation of alerts

We believe that it is very important for the anomaly detection system not to “cry wolf”. Although the “safest” way would be to alert on anything that is slightly abnormal according to predefined rules, that would be counterproductive as the user would very quickly start ignoring the alerts altogether irrespective of the severity. In other words, too many alerts are like no alerts at all. Machine learning is what allows us to achieve the required accuracy.

Understanding normal vs. abnormal data access

Distinguishing normal from abnormal activity across databases, file servers, and other data repositories is a daunting challenge. It is essentially a fuzzy task that has a lot of gray areas. Some normal (i.e. harmless) scenarios may have ostensibly abnormal symptoms while abnormal (malicious) scenarios may disguise themselves as normal-looking situations. That distinction is the role of machine learning in detecting malicious insiders.

“Detection” vs. “Prevention.”

Prevention is possible when it is clear as day and night what an attack looks like, and we can formulate hard-and-fast rules to identify an attack. This is the case in the realm of defending websites and the way in which our SecureSphere Web Application Firewall works. Since the identification of an attack is immediate, we can block off the malicious actors in mid-strike.

On the other hand, defending against insider threats is a whole different ballgame. Here the keyword is “detection” rather than immediate prevention because both normal and malicious users are already inside the system. We cannot keep them out due to obvious reasons, so it’s important to monitor their data access and detect inappropriate or abusive activity.

There are no known hard-and-fast rules for detecting abnormal insider activity; machine learning allows us to track and analyze users’ activity until a sufficient amount of evidence has piled up to indicate a breach.

Q: What separates our approach in Imperva CounterBreach from the current solutions?

Since the context of an event is very important, CounterBreach first learns everything it can about the main actors – the various processes and activities going on ordinarily in the environment. Obviously, the challenge lies in the fact that every customer’s environment is different. This is where machine learning comes in. Once the actors are classified, their “normal” activity is learned, and abnormal activity gets flagged only following the establishment of baseline context. We refer to this whole process as Behavior Analytics.

Granular and high-quality data logs

All of the above is achievable thanks to the quality of the logs collected by our Imperva SecureSphere database and file monitoring products, as well as Imperva Skyfence, which monitors access to SaaS applications. The logs we maintain with regards to user data access is highly granular, and we’re able to collect this detailed information without losing any information about the original activity.

Domain expertise

Imperva has been protecting applications, databases, files and SaaS applications for over a decade. We have a strong understanding of how applications and other consumers of data work with databases and file shares. We are also experts at the technical aspects of efficiently collecting, storing and managing large volumes of database and file access audit at various levels of granularity.

Q: Tell us about the most interesting use case from a customer engagement.

We recently worked with a customer that had a database containing confidential data belonging to a federal law enforcement authority. CounterBreach identified that in the normal run of events, this confidential data was being accessed from various network addresses by a certain type of legitimate application. Then, CounterBreach detected and flagged an incident on a specific occasion where a human user was directly accessing the same sensitive application data, but from a different administrative application. This alert warranted an investigation by the customer. The traditional approach would have failed here as this event would typically get lost in a sea of similar-looking events.

The final discussion in this blog series will focus on the Deception Token technology, which adds unique, deterministic detection capabilities to CounterBreach. More information about CounterBreach is available here.

Find out more about the “Next-Generation Insider Threat Protection” in our upcoming Webinar.