Pseudonymization is a security technique for replacing sensitive data with realistic fictional data that:
- Cannot be attributed to a specific individual without additional information which, according to GDPR Article 4 (5), is to be “kept separately and subject to technical and organisation measures to ensure non-attribution to an identified or identifiable person.”
- Maintains referential integrity and statistical accuracy, thereby enabling business processes, development and testing systems, training programs, and analysis to operate normally.
Pseudonymization can be used when realistic data is needed for application development and testing environments, data warehousing, analytical data stores, training programs, or other business processes. It can also be used when exporting data to non-EU/EEA countries.
The GDPR both encourages pseudonymization and distinguishes it from anonymous data. Details are below.
GDPR encourages pseudonymization
The GDPR encourages pseudonymization for the following reasons.
- Article 6 (4) (e) permits processing personal data for a purpose other than originally intended, in “the existence of appropriate safeguards, which may include encryption or pseudonymization.” Other purposes can include profiling, business analysis, outsourcing data processing to non EU/EEA countries, and using for scientific, historical, and statistical purposes.
- Article 11 (2) exempts the Data Controller from complying with an individual’s rights to access, rectification, erasure, and data portability of his or her personal data (Articles 15 – 20), if the personal data can no longer be linked to the identified individual.
- Article 25 (1) makes pseudonymization a central feature of the requirement for data protection by design and by default.
- Article 32 (1) (a) makes pseudonymization an appropriate technical measure for ensuring the security of processing personal data.
- Article 34 (1) requires that, in the event of a security breach, Data Controllers notify identified individuals impacted by the breach. Since pseudonymization data is not linked to an identified individual, notification is not required unless the individual is identifiable due to:
- The pseudonymization key is disclosed in a security breach.
- The individual can be identified by linking pseudonymized and additional, non-pseudonymized information (e.g., birth date, gender, zip code).
- Article 40 (2) (d) encourages the use of Codes of Conduct that include pseudonymization.
- Article 89 (1) enables processing personal data for scientific, historical, and statistical purposes if the data is safeguarded by pseudonymization.
Pseudonymized data is not anonymous
Anonymized data permanently de-links personal data from a specific identified or identifiable person. For example, personal data is encrypted and the encryption key is destroyed. As such, GDPR implementation is not required for anonymous data.
However, pseudonymized data is not considered anonymous, since a specific individual can be identified if:
- The pseudonymized and additional, non-pseudonymized information are combined to identify the individual. For example, an employee receives performance review feedback where reviewers are identified by a number (e.g., Reviewer 123, Reviewer 456). The employee won’t know which specific co-worker said what, unless the employee can apply additional information (e.g., the feedback includes certain key phrases that are always used by a specific person).
- The pseudonymization key is disclosed in a security breach or other manner. Using the performance review example, the employee’s manager receives the performance review report where the actual name of each reviewer is visible. In this case, the pseudonymization key was made visible to the manager.
To address the fact that pseudonymized data is not anonymous, the GDPR requires the following:
- Recital 26 requires pseudonymized data be treated as personal data if a specific individual can be identified “by the use of additional information.” As such, appropriate and effective technological and organization measures must be implemented to protect the pseudonymized data.
- Recital 29 requires that pseudonymized and “additional information for attributing the personal data to a specific data subject” be kept separate.
- Recital 75 requires implementing appropriate technical safeguards (e.g., encryption, hashing, or tokenization) and organizational policies to prevent unauthorized reversal of pseudonymization.
Data masking and hashing are examples of pseudonymizing sensitive data. Data masking is the de facto standard for achieving pseudonymization. It replaces sensitive data with fictitious yet realistic data, which helps reduce data risk while preserving data utility. An example of data masking is below.
The pseudonymized data can now be safely used in application development and testing environments, training programs, and business and analysis processes within and beyond EU/EEA locations.