Leveraging Imperva Solutions for GDPR Compliance Part II: Pseudonymization
Down to the wire- the GDPR compliance deadline is here.
It’s May 25 and the EU’s General Data Protection Regulation (GDPR) is live. As you know by now, the risk and potential costs associated with a failure to comply with the EU’s General Data Protection Regulation (GDPR) are substantial.
GDPR non-compliance penalties can be severe, and apply to any organization of any size that collects or processes personal data originating in the EU. Starting May 25, 2018, these rules will be officially enforced, which is likely causing some angst for those organizations that have been procrastinating on their readiness efforts.
However, as outlined in our first blog, Imperva data protection solutions can help organizations address key GDPR data security requirements- including sensitive data discovery and classification and monitoring. Specifically, we noted how Imperva SecureSphere and the new out-of-the-box profile for GDPR, the Data Classification Profile for GDPR.
The new functionality allows you to easily scan your databases and classify sensitive information pertinent to the GDPR Regulations- and output this information through comprehensive reporting capabilities. So, we’ve helped you understand how to easily discover and classify your information – which is a key requirement under GDPR Article 35: Data Protection and Impact Assessment. But now what?
The need for “minimizing data” and the role of pseudonymization
Once you’ve taken the necessary steps to understand your sensitive data landscape and have a full inventory of what is onsite you can now start looking at methods and technologies to help reduce the compliance impact and risk associated with the data.
In fact, the GDPR requires that organizations practice data minimization and purpose limitation. Article 32 references Security of Processing, which essentially refers to the need to implement appropriate technical and organizational security controls to protect personal data against accidental or unlawful loss, destruction, alteration, access or disclosure. So, this means that organizations can collect and use data limited to only what is necessary for a specific and defined purpose.
In practice, one key area where data minimization comes into play relates to DevOps environments. While many organizations copy production database content for use in development, testing, QA and analytics environments, it can easily run counter to this data minimization principle. In addition, the copying of data for these purposes significantly amplifies the attack surface within an organization particularly because these replicated data sources tend to be less protected than the source production environments.
While the GDPR doesn’t call out any specific technology (as technology evolves over time) to support data minimization, it does encourage “pseudonymization” of personal data.
Pseudonymization is a security technique for replacing sensitive data with realistic fictional data and generally means removing direct identifiers (names, addresses, email, etc.) associated with the data in question. In particular, pseudonymizing your data, helps facilitate processing of the data in ways beyond (but still compatible with) original collection purposes. Pseudonymization is compatible with other key aspects of the GPDR including: data protection (privacy) by design, security requirements, and as a safeguard when processing data for scientific, historical and statistical purposes (analytics).
How Imperva supports “pseudonymization”
Imperva can help your organization provision secure pseudonymized sensitive data, thereby supporting data minimization and risk through its Imperva Camouflage data masking solution. Imperva Camouflage is an industry-recognized and highly versatile software solution that helps effectively discover, classify, and ultimately obfuscate your organization’s sensitive data through the process of data masking. This high-value process replaces real data with realistic fictional data that is functionally and statistically accurate and achieves pseudonymization and other variations of data replacement and minimization.
So, what exactly is data masking and how does it support the pseudonymization requirement? Let’s use Figure 1 below as our guide. In the original production data set, a record shows that a man named John Smith who is 65 years old has a Social Security number (SSN) of 123-21-9812. After the data masking solution is configured and deployed according to the needs of the end-users, and the requirements of GDPR- John Smith might become Tom Young who is 58 years old and has an SSN of 531-51-5279.
Figure 1: Data masking replaces original data with fictitious, realistic data and supports efficient pseudonymization
The key point here is that the solution accurately and automatically ensures that the masked data maintains the referential integrity and operational accuracy so that personal data can be securely processed for scientific, historical and statistical purposes- across ALL databases and applications. This means that for the end-user and various applications and test functions involved, the data will have all the realism of the original production copy, and all systems will be fully-functional and any necessary complex relationships will be maintained. So, not only do you have a tool purpose-built to support privacy compliance requirements such as GDPR, but one that can ensure continued improvement and security within the DevOps functions.
Implementing Pseudonymized Data for GDPR using Imperva Camouflage
So, Imperva Camouflage can provide significant value in addressing GDPR compliance support, and overall data security and DevOps support through by utilizing fine-grained classification of data and pseudonymization rules applied via the masking engine. But how do you go about making that happen?
Within the solution, a GDPR discovery and classification policy is available out of the box that allows you to quickly identify sensitive data that falls within the scope of the GDPR regulations. No customization or configuration involved. Furthermore, the detailed sensitive data inventory is easily transported from the data discovery engine (CX-Discover™) to the data masking engine (CX-Mask™) within the solution to apply the chosen and automated pseudonymization rules to your data. The overall process is quite simple and involves the following core steps:
- Define your data sources– the first step involves logging into Imperva Camouflage inputting some basic connection information and appropriate credentials required to access those databases. These will be the Datasources that you wish to discover and classify.
- Select and run the desired search rules– Once the data source has been connected and confirmed, simply select and run the GDPR discovery policy from the Search Rule Library.
- Review and refine search results- reviewing the Search Results helps determine the data to be masked and allows for easy export of the information in a variety of formats for review.
- Export and run masking rules to pseudonymize the data- Following the completion of the classification review process, and the desired data has been approved for masking, the inventory is exported to the CX-Mask™ masking engine through an MS Excel-based Functional Masking Document, where the desired masking strategies are easily configured and applied- prior to running the project which will automatically complete the pseudonymization effort.
Let’s take a look at each of these steps in a little more detail.
- Defining a Datasource
When you first log in to CX-Discover, you’ll be presented with a basic (empty) dashboard view. To create a data source including appropriate connection information:
- Click New Datasource in the left-hand navigation pane. The New Datasource page is displayed.
- Provide a name (label) for the new Datasource in the Datasource
- Select the Type appropriate to this Datasource (Oracle for example). CX-Discover uses the database type and related information provided on this page to connect to and communicate with the specified database via JDBC.
- Enter the Host on which this database resides.
- Provide the Port number for this database. Common JDBC port numbers include 1521 for Oracle and 1433 for Microsoft SQL Server.
- Provide the Username and Password for the account CX-Discover will use when searching the database. This database-level user must have the appropriate permissions on the underlying database because these permissions will determine which objects and data the search engine has access to.
- Selecting and Running GDPR Search Rules
Prior to searching a Datasource for sensitive data, you must specify the search rules to use. A standard library of common search rules is provided and includes the comprehensive policy specifically designed for discovering and classifying data according to the GDPR requirements. To select and run GDPR search rules:
- Select the GDPR search policy to run. You can view the underlying search categories by expanding the hierarchy. This search policy can also be easily tailored to fit your needs by navigating to System > Search Rules and selecting the GDPR
- Select the Schemas to Search to identify the scope of the search. This list is automatically populated for you or by clicking get schemas. If this list is empty, click get schemas and then refresh the page once the get schemas job completes.
- Click Next. (The defaults for the remaining options are appropriate for most searches. Refer to Search Rule Configuration in the online user guide for details.) The Overview Page for this Datasource is displayed.
- Click Search. A message indicating that the search job has been submitted is displayed. You can view the progress of the search in the Jobs Page. Once the search completes, the next step is to Review/Refine the Search Results.
3. Reviewing/Refining Search Results
A number of different options exist for viewing and refining Search Results including the capability to view results by Category, Table, Column, and Data Type. To view the results of your GDPR search:
- Click Search Results in the left navigation pane to display the list of tables and columns that were identified by the search process as containing sensitive data.
- Click the Approve check-box to mark all of the new Search Results as approved/accepted as being accurate. This page is also used for modifying the automatically assigned Categories, Subcategories, and Masking Strategies as appropriate.
- Click Save to associate your edits with the underlying Data Model.
- Note- you also have the option to view/export various reports automatically generated reports and charts to aid in reviewing and refining of the Search Results by clicking the Reports link in the Datasource Overview page of the selected Datasource.
4. Exporting/Running Masking Rules
Now that the classification review is complete, and the inventory is ready for masking, you can easily transfer the approved Search Results (including preliminary masking strategies) in spreadsheet (MS Excel) format. This spreadsheet will serve as a starting point for pseudonymizing your data using the CX-Mask engine. This spreadsheet export is also known as a Functional Masking document (FMD).
To export and run masking rules:
- Click Overview in the left-hand navigation pane to display summary Connection and Search
- Click FMD to generate the spreadsheet. Depending on your browser settings, you may be prompted to download/save -or- it may open directly in Excel.
- Review the FMD in MS Excel and make any necessary adjustments to the masking configuration. The process for configuring and running the FMD is covered in more detail in the FMD User Guide.
- Launch the CX-Mask Power User GUI to convert the FMD into an executable masking configuration file.
- Navigate to Tools > Project Generator
- Specify the name/location of the FMD generated in the previous steps.
- Specify the name/location of the masking configuration (project) file to generate.
- Launch the masking engine via the GUI or automate it via command line execution.
- After launching the CX-Mask Power User GUI, open the masking configuration (project) file generated in the previous steps.
- Click Run in the left navigation pane. A list of masking rules (targets) is displayed in the run panel.
- Click Start to initiate the masking engine and apply the configured masking rules to the selected database. Status messages are displayed in the Log window and will show when the masking process has been successfully completed.
So, now that we have Pseudonymized data- what do we do with it?
Once databases that were provisioned for masking have been discovered, classified; and the various masking techniques have been applied, you’re now in a position to report on progress towards GDPR compliance, and in particular the data minimization requirements with Article 32.
From an operational standpoint, the next step in the process simply involves provisioning the now pseudonymized data copies to the DevOps target environments. Once configured, this replication process is typically automated using the existing tools already in use within the customer’s environment. It is quite likely that a single masking process can support both the analytics and application test use cases saving time and cost.
Further, while GDPR compliance is a key requirement and focus, it’s also worth noting that the use of pseudonymized data helps optimize the DevOps function by removing the risk associated with using “real” data, thereby allowing organizations to reduce overhead and processes involved with the approval of copy generation, and also enabling more users (including third parties) and access to the data to complete test and development functions more efficiently and effectively. With the sensitive data removed, the risk is eliminated, and data-driven processes are accelerated.
Further, with the help of Imperva Data Security Solutions, support for the other data security-specific Articles can also be achieved. We’ll get into those use cases and solutions in the next blog in this series.