Imperva Camouflage Data Masking Solves the Data Transformation Problem – and Sets Your Data Free.

IStock_000087780177_Double

With all of the recent attention around the release of Imperva CounterBreach, it may have been easy to miss our announcement regarding a recently inked partnership with Camouflage Software to provide a best-of-breed data masking solution to our customers. The purpose is to protect the actual data while having a functional substitute for occasions when the real data is not required. I had the opportunity to dive deep in the world of data masking, and I thought I’d share why we partnered with Camouflage Software and explain why Imperva Camouflage data masking is critical when it comes to securing your sensitive data, particularly in the brave new world of DevOps.

The Problem — Copies of Data Everywhere
Before we can fully understand what a data masking solution can do, we need to understand the problem. As you know, databases can contain massive amounts of information, and some of that information is regulated by HIPPA, SOX, and PCI for compliance. Additionally, some of that data might be considered intellectual property or otherwise sensitive. Organizations need to sift through gobs of out-of-scope data to find the locations of data considered sensitive or within scope of regulations.
Once the sensitive data is identified and labeled, appropriate security controls can be wrapped around that data, and then risk can be effectively managed. This is where Imperva SecureSphere comes into play–it protects that live data by wrapping security around it, giving organizations a tight grip on the crown jewels of the enterprise. As long as the data stays within SecureSphere, it’s protected.

However, data is valuable and the more it’s leveraged, the more value an organization can derive from the information. That’s why it’s challenging to maintain barriers to information from users coming from the technology as well as the business perspective. Organizations want to mine their data, conduct research within it, tally columns, count records, run reports, test things, and experiment with new ideas. The problem is, organizations end up with a highly secured production environment where the data is locked down with hundreds of insecure copies of the data untracked, undiscoverable, and out of their control.
A few examples of this in practice on the technology side of an organization include times where a DBA might need access to all of the data to test backup procedures. Perhaps teams responsible for running disaster recovery want to practice failovers and develop metrics using real databases. Maybe the Quality Assurance team needs to test new versions of software for production, so they need data. Or, as is particularly common in DevOps environments, developers want to write code that accesses the organization’s data, so they need access to the database.

On the business side of an organization, departments like Marketing want to research the customer base to determine marketing campaign effectiveness. Executive and Product Management teams need access to data for strategic decision-making processes like opening new locations, growth patterns, and performance in markets.

In all of the above, the business needs are not addressed by piercing holes in the security (e.g., firewalls, SecureSphere, filesystem ACLs) of production environments hosting live data. Instead, many disparate copies of data are floating around within an organization. In fact, some organizations have a hard time identifying all of the sources or destinations of copies and copies of copies. Copies of the data could reside in servers under control of IT, but they may also end up in development labs, in the hands of partners, provided to outsourced companies, downloaded to a USB key by marketing or on a developer’s laptop stored in the trunk of his car.
production_data

The practice of deploying additional boundary controls is ineffective at mitigating these risks. A tight perimeter around production will not protect against a thief breaking into the trunk of the developer’s car, stealing his laptop, and subsequently having access to massive amounts of sensitive or regulated data on his local copy of your database.
sensitive_data

As you can see above in (1), building lock-tight security around the production environments and data is only the first step to securing critical data. The copious amounts of copies and copies of copies within an enterprise (2 and 3), and the subsequent leaks of those copies (4) pose massive, unmitigated risks to organizations and their adherence to regulatory obligations.
In the above example, a single database contains sensitive data, with tight controls over data access (1). Most often, a QA/staging/UAT environment exists to test code so a copy of the data usually resides there too (2). DBAs and application developers (3) need data to develop code on. Many applications teams are dispersed, outsourced, “cloudified” and duplicated in a multitude of ways. The further away from the protected source of the sensitive data (1), the less control we have over the copies, who gets access and how they’re stored. There may even be a #5 and #6 to add to the above diagram, resulting in hundreds or thousands of copies of your sensitive data.

The Solution: A Win-Win for Data Users and Security
It’s clear that trying to control the access to sensitive data is a difficult endeavor and likely to have diminishing results as data is copied and moved around within the enterprise. Instead of clamping down and creating obstacles to the data, we can free it from restriction, remove risks, take it out of scope of regulations and encourage the business to continue mining data. All at the same time! This is why I like data masking because you’re literally setting your data free within the organization.
By destroying, in an organized and controlled fashion, the data that poses risks (i.e., credit cards, SSNs, PII)–while maintaining the integrity of the data sets–you can let the relevant business data go unchanged, resulting in a dataset that’s rich with real business value and yet, simultaneously void of risks to sensitive data leakage. It’s absolutely a win-win for organizations that want to work with real data without the associated risks. The result is that the security can remain focused on what’s critical: the production sensitive data. Meanwhile, leveraging copies of masked sensitive information to address other use cases enables the enterprise to have their cake–err data….and mine it, too.
sensitive_data2

Business critical data remains intact where it’s needed, usually in production and protected by SecureSphere (1). Then, a backup copy is placed on a separate system (2) that’s running Imperva Camouflage, the data is masked in the copy (2), and the copy is dispersed through the data eco system (3,4).
Why We Partnered with Camouflage Software
While there are a couple of other masking products on the market, Imperva chose to work with Camouflage Software for several reasons that will become obvious once customers deploy our data masking solution. The workflow is well designed, there is a good sense of quality and intuitiveness in the product and the GUI is well appointed and powerful without becoming overwhelming. The process is well defined but still allows for configuration of each step, if needed.
process

The Process — Find and Mask the Sensitive Data
To begin, we need to know where the sensitive data resides. The first phase of our endeavor is to use the Imperva Camouflage Discovery and Masking tool to find the sensitive data. Next, we classify the data, then decide what to mask and how to mask it.

Afterwards, we execute the masking job to mask the data and finally, we set the data free!
From a high level, the process to mask the data with Imperva Camouflage is as follows:

1. Discover
• locate sensitive data in databases, data warehouses, flat files, medical systems, PeopleSoft, Oracle Financials and mainframes
• automate and run regularly
• report changes to location of sensitive data
2. Assess and Classify
• optionally manipulate the functional masking document to control the baseline, or use the defaults
• evaluate and rank risks
• map relationships
• document the masking method required
3. Set Policies
• reusable projects
• control source and destination (databases, Big Data, flat-files, mainframes)
• policies and rules control data transformers
4. Deploy
• maintain referential integrity and consistency of data between tables, databases, and masking operations
5. Manage and Report
• compliance reporting
• before and after generated with each run

Masking the Data
The data that’s masked still appears valid and has referential integrity preserved. To illustrate this, see this simple example of masking in practice:

masking-the-data

The real data exists in the first table. Emp_id ‘0011’ has a username of ‘smithr’, an SSN of ‘123-21-9812’. Once that table is masked by Imperva Camouflage, the data will resemble the second table. The emp_id column, our EmployeeID, is considered sensitive. So, we masked it to another similar number, in the above example it became 2012. Same for the SSN–it was masked using a new SSN that is random, but valid. In other tables that need to match this Employee ID, the values are masked as well using the same masking value so that all of the foreign keys can continue matching, referential integrity is preserved, and queries on the data still return with results.
There are many methods available to mask the data, as shown below:

data_transformers

Reporting
After data is masked, it’s important to validate that everything went well. For this purpose, we use the Imperva Camouflage Data Masking Reports console in the GUI. Here, you can validate that the masking jobs completed with expected results and know when new data appears; for example, when someone copies a sensitive column in the dataset to a new field or a temporary table.

reporting

Summary -> Su**ary
In summary, data masking is a key component to secure an organizations’ sensitive data. Without it, the costs of securing sensitive data can be exponentially higher, and will fail to provide the expected benefits. As technology around us changes to adapt to new concepts like DevOps and Big Data, the need for access to data will be as prevalent as ever. An organization’s hunger for access to information will trump the needs to secure it. Therefore, masking the sensitive data in order to make the information available to the organization without risk, provides great advantages in terms of security, cost savings, and better adherence to regulatory compliance obligations, while speeding up the flow of data from a secured silo to various components of an organization.

It’s a win-win for our customers and an easy win for security in general.