Data masking, also referred to as data de-identification, pseudonymization, anonymization or obfuscation, is a method of protecting sensitive data by replacing original data with fictitious but realistic data. By masking data, organizations enable data to be safely used in situations where real data is not needed.
Data masking replaces original data with fictitious, realistic data
Common examples of business processes that require access to realistic data, but not the actual data include:
- Application development and testing – application developers and QA teams need data to make sure applications work when they roll into production environments.
- Outsourcing – companies often rely on outside service providers and suppliers that need access to data for research, analysis, training, testing or development.
- Training – training departments require data to populate training systems that customers or partners access.
- Business intelligence (BI) and analytics – business analysts and researchers need to aggregate and analyze data.
Why Mask Data?
Organizations mask data for two primary reasons:
- Protect sensitive data: It is common practice for organizations to copy data from production systems to be used in other non-production environments. These copies of sensitive data increase the potential attack surface and unnecessarily expose sensitive data to employees who may not be authorized to access that data. Data masking eliminates sensitive data from non-production environments, reducing the risk of data breach. It is a critical tool in a multi-layered data security strategy.
- Compliance: Many data privacy and protection regulations focus on safeguarding personal data such as Personally Identifiable Information (PII), health records and financial data. Many of these regulations call for limiting access to sensitive data based on a need-to-know basis. For example, the EU General Data Protection Regulation (GDPR) introduces data minimization and pseudonymization as key data protection principles organizations must follow. Other regulations, such as PCI DSS Requirement 6.4.3, specifically prohibit the use of production data for test and development. Data masking is an essential tool to help organizations avoid unwanted data access, reduce sensitive data exposure, and improve their compliance posture.
The purpose of data masking is to provide data that looks and acts like the original data, but lacks the sensitivity of the original data. This way the masked data does not pose a risk of exposure or unauthorized access. It also allows organizations to employ fewer security controls for the masked data repositories and to reduce the scope of compliance audits.
How Does Data Masking Work?
Unlike encryption and tokenization, data masking is a non-reversible process where data goes through a one-way transformation. Encryption is a reversible process that scrambles data at rest, but then unscrambles the data once it’s accessed.
Several methods can be used to alter sensitive data, including character or number substitution, character shuffling, or the use of algorithms to generate random data that has the same properties as the original data.
With data masking, the data values are changed while data formats remain unchanged. For example, credit card numbers have a 16-digit format that looks like this: 1234-5678-9123-4567. Masking data changes the numbers, but maintains the same 16-digit format. Using the example above, the masked credit card number could become: 9876-5432-1987-6543.
Generally speaking, the process for data masking is to 1) use a backup copy of the production database, 2) assess and classify the data to identify the sensitive data, 3) mask the sensitive data using defined masking policies, and 4) distribute the masked copies.
How data masking works