Detecting Data Breaches: Why Understanding Database Types Matters
Different data characteristics and access patterns found in different database systems lead to different ways of detecting suspicious data access, which are indicators of potential data breaches. To accurately detect data access abuse we need to classify the database processing type. Is it a transactional database (OLTP) or a data warehouse (OLAP)?
OLTP vs. OLAP – What’s the Difference?
Today, in the relational database world there are two types of systems. The first is online transactional processing (OLTP) and the second is online analytical processing (OLAP). Although they look the same, they have different purposes. OLTP systems are used in business applications. These databases are the classic systems that process data transactions. The queries in these databases are simple, short online transactions and the data is up-to-date. Examples include retail sales, financial transaction and order entry systems.
OLAP systems are used in data warehouse environments whose purpose is to analyze data efficiently and effectively. OLAP systems work with very large amounts of data and allow users to find trends, crunch numbers, and extract a ‘big picture’ from the data. OLAP systems are widely used for data mining and the data in them is historic. As OLAP’s number-crunching usually involves a large data set, the interactions with the database last longer. Furthermore, with OLAP databases it’s not possible to predict what the interactions (SQL queries) will look like beforehand.
Figure 1: OLAP and OLTP data flow
The different nature of OLTP and OLAP database systems leads to differences in users’ access patterns and variations in the characteristics of the data that is stored there.
Comparing Access Patterns
With OLTP we expect that users will access the business data that is stored in the database using the application interface. Interactive (or human) users are not supposed to access the business application data directly through the database. One exception might be DBAs who maintain the database, but even in this case there is no real reason that a DBA should access business applicative data directly. It is more likely that DBAs will only access the system tables (which store the metadata of the data store) in the database.
With OLAP the situation is different. Many BI users and analysts regularly access the data in the database directly and not through the application interface to produce reports and analyze and manipulate the data.
The Imperva Defense Center worked with dozens of databases across Imperva enterprise customers to analyze the data access patterns for OLTP and OLAP databases over a four-week period. We used audit data collected by SecureSphere and insights gathered from CounterBreach. Figure 2 shows the average number of new interactive users who accessed these databases during the four-week period.
Figure 2: The number of new interactive users who accessed OLTP and OLAP databases over time.
As indicated in Figure 2, there were almost no new interactive (or human) users who accessed OLTP databases over time. However, this was not the situation for OLAP databases.
Comparing Data Characteristics
The data in OLTP systems is up-to-date. In most cases, the tables that hold the business application data are not deleted and repeatedly re-created – they’re stable.
On the other hand, in OLAP systems the data that is saved in the database is historic data. There are ETL (extract, transform, load) processes which upload and manipulate data on the database periodically (hourly/daily/weekly). In many cases, the data is uploaded to new tables each time, for example each day the data is uploaded to a table, which contains the date of the data upload. This leads to many new tables in the database, including temporary tables which help manipulate the data, and tables which are deleted over time.
Again, the Imperva Defense Center analyzed the characteristics of data stored in OLTP and OLAP databases using Imperva enterprise customers’ audit data collected by SecureSphere and insights gathered from CounterBreach. Figure 3 shows the average number of new business application data accessed by interactive users over a four-week period. The average number of new business application tables in OLTP is very low, whereas in OLAP this amount is much higher.
Figure 3: The number of new business application tables in the database over time.
Incorporating OLTP and OLAP Differences to Improve Detection of Suspicious Data Access
Detecting potential data breaches in a relational database requires identifying suspicious activity in the database. To identify suspicious activity successfully—without missing attacks on one hand and not identifying many false positive incidents on the other—the detection should be based on the story behind the database.
We need to ask ourselves, what is the purpose of the database? How should we expect interactive users to act in the database? How do we expect applications to act in the database? What can we tell about the data in the database? To answer these questions a deep understanding of databases – user types, data types and database types – is required.
The latest release of Imperva CounterBreach adds further understanding of the database types and factors database types into its detection methods. Leveraging the Imperva Defense Center research on behaviors of interactive users for OLTP and OLAP databases, CounterBreach uses machine learning to classify database types based on access patterns of interactive users to the database. The machine learning algorithm analyzes a number of different aspects…the number of business intelligence (BI) users and DBAs who access the database, which data is accessed by those interactive users, the amount of new business application data that is created in the database, and more.
With an understanding of the database type, CounterBreach determines the best method to detect suspicious activity. In databases that act like OLTP systems, it detects and alerts on any abnormal access by an interactive user to business applicative data.
In OLAP systems where interactive users access business application data as part of their day-to-day work, CounterBreach won’t alert on such behavior because it’s legitimate. In these systems, it will let BI users do their jobs and use other indicators, such as an abnormal amount of records exfiltrated from the database’s business application tables, to detect data abuse. This helps keep data driven business processes functioning and reduces the number of false positives detected.
Imperva data scientists continue to research and identify additional characteristics that distinguish OLTP and OLAP systems. These characteristics go beyond the access patterns of interactive users to the database and the data stored in the database. They include the names of the tables that are stored in the database, the source applications which are used to access the database, ETL processes, diversity between operations in the database, ratio between different entities access to the database and much more. This ongoing research will further refine the detection accuracy needed to detect potential data breaches.
Learn more about data breach detection. Read our paper on the top ten indicators of data abuse and understand how to identify insider threats.