As digital information and data continues to accumulate worldwide, new big data solutions grow more and more popular. The introduction of IoT into our lifestyle, which turns appliances into smart data logging machines, along with organizations tracking behaviors for data science and research purposes, has made the move into big data storage inevitable.
Non-relational databases provide us with volume, velocity, variety and veracity, making it the perfect solution for storing huge portions of complex data without the overload in computing power and the high costs. Moreover, the wide selection of stable and trusted big data solutions, make it an accessible option for virtually everyone. This is all well and good, but when the stored big data is sensitive or private it requires protection, and whereas big data solutions grow rapidly popular, the public understanding of big data security is lagging, leaving tremendous amounts of data exposed.
When it comes to relational databases, which have been around since the 70s, data security has already become a standard approach with common practices, security protocols and solutions that cover almost every breach. Big data should be no different, as it is facing the same threats of external breaches, malicious insiders, and compromised insiders. So how do we take the trusted, successful data security methods used on RDBMS, and render them compatible with big data servers? How can we close the gap between big data servers’ demand, and the lacking security measures used to protect them?
Discovery: taking control of data
One of the many advantages of big data is its ability to spread out on distributed environments, in order to maximize the available computing power and to allow data backup. The downside is that many organizations and companies can’t even tell which of their servers contain big data clusters.
As a first step, a discovery process should be performed which is used to signal out which of the servers contain big data. The discovery process allows IT teams to find unknown databases by scanning the enterprise networks. Moreover, database discovery should be automated and configured to scan specific network segments on demand or at scheduled intervals.
An additional aspect of the discovery process is to be aware of the services the organization uses to approach big data. Services such as Hive or Impala, which can read and write big data function as a doorway to it, and as such should be protected against infiltration of malicious parties.
Classification: identifying sensitive data
The discovery process of big data clusters most likely left us with the mapping of huge amounts of data, but not all of it is sensitive and requires protection. In fact, attempting to monitor all of the information will lead to excessive costs and redundant efforts. To avoid these, the next step will be to classify which part of the discovered information is sensitive and requires protection.
The term sensitive information not only refers to trade secrets the organization wishes to keep to itself, but also includes data that is defined as sensitive by regulations, such as medical information, financial information, private identity details, and more. In order to comply with these regulatory requirements, the organization must know which parts of its big data clusters are considered sensitive.
The data classification process tags data according to its type, sensitivity, and value to the organization if altered, stolen, or destroyed. Upon its completion, the organization understands the value of its data, whether the data is at risk, and which controls should be implemented in order to mitigate risks. Data classification also helps an organization to comply with relevant industry-specific regulatory mandates such as SOX, HIPAA, PCI DSS, and GDPR.
Assessment: detecting database vulnerabilities and misconfigurations
At this point, the organization is aware where the sensitive information resides, assessment of its current security status can be performed. This process informs teams where the organization stands today in terms of utilizing the protection tools already available. Following a predefined checklist, the security items on the list are marked and checked if used correctly and which of them are not used or can be used better.
The assessment process is an automated test which is able to detect database vulnerabilities and misconfigurations such as default passwords and required version updates. To achieve maximum security results, this process should use pre-defined vulnerability tests, based on CIS and DISA STIG benchmarks that are updated regularly.
Once the assessment portion is complete, detailed reports need to be produced, including recommended remediation steps that will help the organization to maximize data security using his available resources.
Monitoring: audits, alerts and reporting
Once the assessment process has been completed successfully, the organization knows that the data is secure for the time being. But how can it be protected from future breaches? How can this security status be maintained in the future?
This needs to be done by tightly monitoring the sensitive information in order to achieve three goals:
- The first one is to have an audit trail, which logs all actions performed on the sensitive data and the people performing these actions. This will be useful for forensic investigation purposes, if a breach occurs, and will help to comply with regulatory policy.
The audit trail provides complete visibility into all database transactions, including local privileged user access and service accounts. It also continuously monitors and audits all logins/logouts, updates, privileged activities and more.
- The second is alerts. Dynamic profiling can automatically build a whitelist of the data objects regularly accessed by individual database accounts. Admins can create policies that generate an alert when a profiled account attempts to access a data object that is not whitelisted.
- And last is reporting. All monitored data, including alerts, is logged and can be accessed at any time for reporting purposes.
Analytical insights: building behavioral patterns and detecting anomalies
Every piece of information logged and audited by the monitoring tools is gathered and saved. Once enough data has been accumulated, it needs to be processed in order to create individual and organizational behavioral patterns.
This process utilizes machine learning to automatically uncover unusual data activity, surfacing actual threats before they become breaches. How? It first establishes a baseline of typical user access to database tables and files, then detects, prioritizes, and alerts on abnormal behavior. Moreover, it gives the ability to analyze the data access behavior of particular users with a consolidated view of their database and file activity, investigate incidents and anomalies specific to the individual, view the baseline of typical user activity, and compare a given user with that user’s peer group.
Once these are established, the highest risk users and assets need to be spotlighted so the data security team will prioritize the most serious data access incidents, investigate events by filtering open incidents by severity, then take a deeper look into specific incident details about the user and the data that was accessed.
Achieving a successful security process, including the steps mentioned above, can be a daunting task when you are using insufficient tools and protocols. That is why Imperva developed SecureSphere, one unified security platform that monitors and protects all enterprise data, managed using a single pane of glass. The combination of SecureSphere, along with CounterBreach, Imperva’s behavioral analytics solution which detects risky data access behaviors, will provide you with maximized security for your big data.
Keep your finger on the pulse
Sign up for updates from Imperva, our affiliated entities and industry news.
Keep your finger on the pulse
Sign up for Imperva updates and industry news and never miss a beat.