By the very nature of business, some of the big data that is stored and processed is sensitive. Who has access to that data within your big data environment? Are the environment and the data vulnerable to cyber threats? And there’s the issue of compliance as well. Big data deployments are subject to the same compliance mandates (e.g., GDPR, HIPAA, PCI, and SOX) and require the same protection against breaches as traditional databases and their associated applications and infrastructure.
Same Security Requirements, Different Challenges
All the best practices for data security are still applicable for big data environments. The most critical ones are:
- Access control
- Threat filtering
- Activity monitoring
The problem is HOW to achieve security and compliance for big data environments given the unique challenges they present.
Part 1 of the Big Data Security Challenge: The Data Itself
Much of the challenge of security big data is the nature of the data itself. Consider the impact on security of the well-known three v’s of big data:
- Volume: Enormous volumes of data require security solutions built to handle them. This means incredibly scalable solutions that are, at a minimum, an order of magnitude beyond that for traditional data environments.
- Velocity: Your security solutions must be able to keep up with big data speeds. You’ll need to focus on data parsing and collection throughput, the degree of automation that is available, and the ability to deliver real-time visibility of policy violations and other events.
- Variety: Mixing multiple sources and types of data with different access permissions compounds classification and policy-setting challenges, elevating the need for robust audit capabilities.
Part 2 of the Big Data Security Challenge: The Environment
It’s not necessarily the associated infrastructure and technology within big data environments that make it more challenging to secure, it’s the multiplicity that dramatically increases complexity:
- Multiple layers: For example, the open source Hadoop framework has different layers of the stack serving a variety of purposes, from distributed storage at the bottom, to table and schema management, distributed programming, and querying/interface options at the middle tiers, and a wide range of management tools along the top. There is no single logical point of entry or resource to guard, but many different ones, each with an independent lifecycle.
- Multiple technologies: Often big data environments will use multiple technologies for data storage and retrieval. For example, it’s not uncommon for an implementation to include either or both relational stores and query tools to support analytical workloads/purposes and non-relational technologies—also known as NoSQL technologies—for real-time, interactive workloads.
- Multiple instances: Many big data environments include multiple instances or versions of the same core building blocks, except from different vendors, such as different Hadoop distributions and NoSQL offerings. This means a greater amount of diversity and complexity to be addressed by security tools and staff.
- Multiple, dispersed data stores: Big data deployments typically have a multitude of geographically distributed data stores and, therefore, numerous physical nodes requiring protection. This inherently increases the potential for inconsistent security policies and practices, suggesting the need for solutions that feature strong, centralized administration capabilities.
Part 3 of the Big Data Security Challenge: The People
Finally, there’s the challenge presented by the lack of security knowledge and understanding in the people working most closely with the data: data scientists and developers. Data scientists, with their skills and experience working with structured and unstructured data to deliver new insights, don’t necessarily think about the security of the data. It’s not surprising given that new technologies have encouraged data scientists to view big data as a giant sandbox where they are the owners and can decide how the data will be used.
While most development projects rely on access to non-sensitive, test data instead of live, production data, big data application development by its nature often falls outside of the more secure processes set up within IT. And with higher-access privileges than many others in the organization, developers also present a greater security risk either through accidental means or malicious intent.
The Appropriate Security Solution for Big Data
There’s no time to waste when it comes to rethinking security for big data environments. The number and breadth of data breaches continues to grow unabated, with a 40% increase in data breaches in 2016 reported by the Identity Theft Resource Center. Everyone from the CIO on down needs to understand and prioritize implementing better security for big data—after all, the last thing you want to hear is that there’s been a big breach in your big data.
Look for a big data security solution that lets you:
- Continuously monitor and audit all access to sensitive data.
- Uncover unauthorized access and fraudulent activity by maintaining baselines of normal usage patterns and transactions and then flagging any deviations that are observed.
- Alert and respond to attacks and unauthorized activities in real time.
- Stop targeted attacks and other advanced cyber threats through out-of-the-box integration with leading anti-malware solutions.
- Accelerate incident response and forensic investigations with advanced techniques for visualizing and analyzing detected events.
- Automate reporting and compliance activities across both traditional and big data environments.
To help IT security and compliance teams choose the right data-centric audit and protection (DCAP) solution for their big data environments, Imperva created a white paper that identifies five key requirements for evaluating vendor offerings. Download the white paper to learn more about what you need to look for in a solution to overcome the scalability, speed, diversity, and complexity of securing the big data environment.