Data masking is an effective way to protect a large majority of your organization’s data. It replaces original data with realistic, but fictional data—reducing production data sprawl and your attack surface footprint, while maintaining the data’s use for things like development, analytics modeling and testing.
In this Whiteboard Wednesday session, Steve Pomroy, chief technologist for the Imperva Camouflage product line, walks through the fundamentals of data masking — what it is, how it works, and why you should use it.
Data Masking 101 – Video Transcription
Hi, welcome to Whiteboard Wednesday. I’m Steve Pomroy, chief technologist for Imperva Camouflage. Today I’d like to talk to you about data masking. What it is, how it works, and why you should use it.
The Challenge – Exponential Amounts of Data
To start, I’d like to talk about the challenge that many of our customers are facing. As most people know, the volume of data that organizations have to manage is growing quite quickly. The production data used to run organizations is what most people think of when they think about that data growth. However, there’s also significant growth in the number of copies that are being made of those production databases.
Copies of Production Data Increase Attack Surface and Lack Security Controls
In a study by IDC, that number on average, was found to be 10 times. So, for every production database, on average, there are 10 copies being made. Obviously, that’s a huge amplification of the attack surface there.
Another thing that most people aren’t aware of, is that even though the sensitive data is being copied, the security controls around that data don’t necessary follow. So, the controls that we have around the production environments are not typically in place around those lower environments.
So why are all these copies being made? There are a number of legitimate business reasons driving those copies:
- For example, dev and test is one reason that the copies are being made.
- As well as to grant access to third parties, contractors and so on.
- They’re also being made in support of training, for the realistic scenarios that that represents, as well as in support of analysis.
The Solution – Map, Unlock, Accelerate
So, let’s talk a little bit about the solution. At Imperva, we conceptualize this as map, unlock and accelerate. So, in map, we’re helping you understand your sensitive data landscape. What databases do you have, what’s in them, we help classify the data…essentially help create an inventory of your sensitive data. We then take that sensitive data inventory and feed it to our data masking engine, which basically removes the sensitive data, and replaces it with a realistic fictional equivalent. Thereby, unlocking that data, which then accelerates all of these downstream processes.
DevOps and the Move to the Cloud
Just to touch on a couple of them…DevOps because you’ve got realistic but fictional data, it just helps drive that process faster. In moving to the cloud, many organizations start by moving their dev workloads there first, and data masking is a great way to start that adoption. So, for example, you can mask your sensitive data on prem, prior to moving it to the cloud, so it’s just a great way to start that cloud adoption.
It also helps drive compliance, so compliance for things like PCI, HIPAA—GDPR is another common one today. It works in concert with understanding the scope of sensitive data that’s within your organization, as well as minimizing the scope of that data.
Of course, security is also important. We view data masking as an important additional security layer in protecting that sensitive data, so just imagine all of those copies that are made, applying masking to it. If a hacker were to breach and get in to those copies, they’d essentially be stealing fictional data.
How it Works
Let’s talk a little bit about how it works. With masking, we’re physically attaching to the database, and then replacing that sensitive data as I said, with a realistic, fictional equivalent. In this scenario, we have things like names, birth dates, email addresses, and so on…you see the real data here. When it’s masked, it’s essentially indistinguishable from the original data. Names still look like names. Birth dates still look like birth dates, and so on. Really retaining that realism, that high quality of the data. It’s also retaining all of the data relationships which is critical when you are working with those copies of data.
Why Use Data Masking
So why should we do masking? To summarize some of the things that I’ve touched on in the past few minutes…
Data Minimization Limits the Scope of Sensitive Data
It’s about data minimization. When you apply masking to all of those copies, you’re essentially removing the risk from those copies, you’re reducing the scope of the sensitive data, so essentially minimizing the amount of sensitive data that you have, which obviously decreases the associated risk.
Realistic Data for Development
Why is the realistic data so important? When it comes to functions like development and testing, the earlier in the cycle that you can catch bugs, the cheaper they are to fix…the easier they are to fix. Then the outcome…so the software’s that developed, the system that’s deployed, is much more likely to be of high quality and to be successful because it’s been developed with that quality data, and caught those defects early. So that’s why it’s critical to have that realistic component of the masked data.
That also factors in to analytics as well, where you don’t require access to the individual personal information, just that realistic equivalent of it, so that you can still get the valid analysis from it.
Enables Need-to-Know Access
It also helps limit on a need to know basis, access to that sensitive data. Without masking, all of these functions essentially happen against those [production] copies when they really shouldn’t be. They should be operating against masked data which removes that risk, removes access to that sensitive data, from those job functions, but allows them to proceed quite successfully.
So, thanks for tuning in. I really appreciate you taking the time to hear me speak through data masking…what it is, how it works, and why you should use it.