Challenges of Big Data Security – Whiteboard Wednesday [Video]
Database security best practices are also applicable for big data environments. The question is how to achieve security and compliance for big data environments given the challenges they present. Issues of volume, scale, and multiple layers/technologies/instances make for a uniquely complex environment. Not to mention some of the big data stored and processed can also be sensitive data. Who has access to that data within your big data environment? Are the environment and the data vulnerable to cyber threats? Do your big data deployments meet compliance mandates (e.g., GDPR, HIPAA, PCI, and SOX)?
Drew Schuil, Vice President of Global Product Strategy, returns to talk about big data security in today’s Whiteboard Wednesday. Learn about the challenges associated with securing big data and requirements for protecting it as you build out your plan.
Hi, welcome to Whiteboard Wednesday. My name is Drew Schuil, Vice President of Global Product Strategy at Imperva, and today’s topic is Challenges of Securing Big Data.
I meet with a lot of customers and chief security officers and we talk about protecting databases and file systems, so structured and unstructured data. And when I bring up big data, it’s often something that’s an afterthought or really hasn’t been looked at yet. So, we want to talk about some of the issues and things to get in front of this problem—this opportunity—as it arises.
Click to enlarge image.
The Big Data Trend
Let’s look at some of the trends. The biggest thing to note here is that big data is growing and it’s coming fast. IDC is predicting double digit growth in big data lakes within large enterprises and part of the reason that data collection is exploding is we’re seeing a proliferation of IoT, or Internet of Things, devices. Whether it’s the consumer market or the business environment, these devices are collecting metadata that’s very valuable to organizations for data analytics, for market trends, for consumer activity. The more and more data that’s being collected is being thrown into these big data lakes.
That leads us to our next trend here, which is sensitive data. Most organizations I talk to say, “Look, we’re not storing credit card numbers. We’re fairly/100% certain about that.” However, when we start looking at some of the newer regulations that have teeth, like the Europe’s GDPR, now the scope is potentially wider when we talk about personally identifiable information (PII). Things like first name, last name, email address, address, some of these little pieces of information that perhaps were benign before are coming into compliance, into scope, for data protection and [it’s important to make] sure that we’ve got a security strategy in mind.
Big Data Security Requirements
Let’s look at the framework. As you can see, access control, threat filtering, etc.—really the same kind of concepts that we had [for relational database security], but there’s some spin. There are some new things when we talk about big data.
- Access Control and Threat Filtering: Specifically, with the first one, access control. When we talk about database environments as an example, they are fairly locked down. You’ve got DBAs, you’ve got least permission, auditing and entitlements reviews if you’re in financial services. However, within big data environments, because of the nature of big data and the analytics and the people that need to have access to it, a lot of times permissions are granted on a very wide basis. It’s a little different when we’re thinking about production databases versus production big data environments, because more and more people have access. With that, it increases the landscape for threats. Whether it’s endpoint threats and malware infections and account takeover, whether it’s malicious insider use cases—someone gaining access to data that they shouldn’t have. Or a DDoS attack, someone that says, “Hey, this is a big data environment that’s critical to the business, I’m going to extort you by threatening to DDoS that environment.” The same types of threats that we see with other business applications.
- Activity Monitoring and Alerts: That leads to activity monitoring. I mentioned GDPR, Europe’s data privacy regulation, and activity monitoring and auditing. Being able to understand who is accessing what. Is that appropriate? Does that violate some regulation or data security standard within the organization? And then being able to get this information to the team that’s responsible for securing it. A lot of times that means feeding it from the monitoring tools into a SIEM or into a SOC or some other monitoring mechanism.
[We’ve got the] trends, it’s taking off. Big data is not something to ignore. We’ve got the same requirements. In the next section we’ll talk about some of the challenges that are introduced inherently by big data.
Big Data Security Challenge #1 – the Data Itself:
So, we have the same security requirements, but some very different challenges when looking at how to secure big data and it starts with the three V’s: volume, velocity, and variety.
- Volume: One of the benefits of a big data environment is that it can handle massive amounts of data and actually make sense of it and crunch it in a lot of different ways to produce valuable results for the business.
- Velocity: The other challenge is velocity. Particularly within high tech environments or retail or banking, where decisions need to be made very quickly on this data, having a security solution that is real time not only for alerting and monitoring, but also blocking—to keep up with that becomes a challenge when we’re balancing cost versus risk.
- Variety: The third issue is variety. Because of the amount of, I’d say, the relaxed permissions that we have within big data, the number of people from different departments and access points coming in and doing different things to the data, it really becomes a challenge when we start talking about data discovery and classification. Which data is sensitive, so that I can have some focus and scope? And then how do I apply policies against that if I’m having a challenge classifying the data …and I’m also having a challenge in terms of classifying the users and permissions and whether they should or shouldn’t be accessing the data? It really compounds the problem when we look at big data in the context of these three V’s.
Big Data Security Challenge #2 – the Environment:
The second challenge here is the environment.
- Multiple Layers: When we look at big data environments, it’s not as simple as our traditional, let’s say, database environment, where we’ve got an application talking to an Oracle database and a pretty clear, crisp understanding of where we need to put in controls and blocking points. If we look at our diagram over here, we’ve got multiple different layers from distributed storage and querying layers to different management applications. Look at this environment and just the complexity of it. It should look much more difficult from a security perspective than something, again, like an Oracle or DB2 or SQL server stand-alone type of an application to protect.
- Different Technologies: We’ve also got different technology mixed in to each big data environment, so you may have NoSQL, NSQL, data warehouse, BI tools. You’ve got all these different types of technologies within the environment, so again it’s not as cookie cutter as we’re used to in the past.
- Multiple Instances/Dispersed Data Stores: You may have different instances and it may be dispersed over a wide geography, particularly if we’re dealing with a large multi-national, like a retailer, and we’re crunching data across multiple different regions. Now, you’ve got to not only look at the complexity of a single environment, but replicate that, be able to have the security environment talk to other security environments across a wide geography. You start to see some of the challenges when we talk about securing big data environments.
Big Data Security Challenge #3 – People:
All right. The third challenge is people and like we’ve talked about before in other sessions, people can often be the weakest link when we’re talking about security, especially in a complex environment like big data. If we look at the people that are most adept to administrating and dealing with a big data environment, we’re talking about computer scientists, PhD types. We’re talking about people that are going to be really focused on anything but security. They’re going to be focused on making the system work fast, getting accurate results. Really the last thing on their mind is going to be security and compounding that, again, is the privileged access problem.
The nature of these environments is very different from what we’re used to with a traditional database environment, where, let’s say, you’ve got production that’s very locked down and then maybe you’re doing some data masking, a best practice for your pre-production and test within a big data environment. A lot of times it’s much more open and you’ve got developers with very unrestricted access to potentially sensitive data and security – again, it is an afterthought when we talk about people accessing these systems.
Where to Start?
We’ve talked about big data trends. We’ve talked about some of the challenges in securing big data. Let’s talk about what you can do next. Where you can start. This section is a little bit of motherhood and apple pie, but some interesting tidbits that we’ve heard from customers that we’ve talked to.
- Raise Awareness: We’re actually seeing financial services and retail, some of the early adopters, implement solutions like Imperva to address the security and compliance requirements for big data. And what they’ve told us is they’ve started with raising awareness within the organization. Basically saying, “Hey, we’ve got databases, we’ve got file systems, we’ve got cloud that we need to deal with. Big data also falls into our data security strategy.” Just raising awareness within the organization, so it doesn’t come as a surprise.
- Proactively Interview Business Units: Then proactively interviewing the business units. So, talking to marketing, talking to the CRM teams, the customer support teams, talking to any of the business units that may be early adopters of big data so that you can get in front of and be aware of those projects and not be reacting later on to surprise projects.
- Develop a Strategy/Build a Plan: Developing a strategy and building a plan. It’s much easier to be able to respond very quickly to the executives and say, “Hey, I knew this was coming. Here’s our plan. I’ve had this plan in place for the last six months, 18 months.” Really to just get ahead of these issues.
Additional Security Requirements for Big Data
Some of the requirements that we’ve heard from early adopters that are rolling out Imperva to protect big data…back to the complexities that it’s got to be able to address the three V’s. It’s got to be scalable, it’s got to be able to address very high performance environments, it’s got to be able to be deployed in a distributed environment across multiple different geographies. We’ve got to be able to integrate with other pieces of the security ecosystem, much like we see in protecting our structured and unstructured data. We’ve got to be able to integrate with SIEM. We’ve got to be able to pull information in about the risk profiles of users who are interacting with data, profiles that may contain information not seen by Imperva, but by some of your other security tools.
And they want to be able to leverage existing solutions. So, if they’ve already deployed a solution like Imperva to audit their databases, to audit their file systems in SharePoint, to audit their cloud systems, to secure their web applications…why not be able to take that same console, the same policy engines, the same framework they’ve already developed and apply that to the next data type? To apply that to big data. So, that’s something that a lot of organizations are looking for, not yet another vendor but to be able to consolidate their vendor portfolio.
Then, finally, actionable alerts. This really goes back to being able to provide context. In a database environment, we’re talking about millions and millions of events, let’s say in Oracle or DB2. Now, when we shift to talking about big data, it could be billions or trillions of events. So, it becomes even more important that we have things like machine learning that can understand and make sense of good versus bad and inappropriate behavior so that we can send actionable alerts, that we can send single digit alerts to the rest of the ecosystem, the SIEM and the SOC and so forth.
So, that’s our big data talk. I hope you found it helpful and please tune in for additional whiteboard sessions. Thanks.
Learn more about big data security solutions from Imperva.