Most participants in the trench warfare of IT security agree that the best way to protect data is to apply a layered approach to security. Data masking is a security and privacy enhancing technology recommended by industry analysts as a must-have data protection layer. While terminology varies across the industry, let’s start by defining data masking as replacing sensitive data with a realistic fictional equivalent for the purpose of protecting data from unwanted disclosure.
Data masking comes in two basic flavors: static and dynamic. Static data masking (SDM) permanently replaces sensitive data by altering data at rest. Dynamic data masking (DDM) aims to replace sensitive data in transit leaving the original at-rest data intact and unaltered. This post reviews the capabilities, use cases, advantages and disadvantages of both.
Static Data Masking
Although the unfortunate “static” moniker may imply sluggishness (perhaps “masking-at-rest” is better?), SDM is an established technology capable of protecting a large swath of the data within your organization. See this recent blog post touching on the size of the problem.
Why Use Static Data Masking?
SDM is primarily used to provide high quality (i.e., realistic) data for development and testing of applications without disclosing sensitive information. Realism is important because it allows dev and test teams to be more effective at identifying defects earlier in the development cycle, therefore driving down costs and increasing overall quality.
Additional uses include protecting data for use in analytics and training as well as facilitating compliance with standards and regulations (such as GDPR, PCI, HIPAA) that require limits on the use of data that identifies individuals.
SDM also facilitates cloud adoption because DevOps workloads are among the first that organizations migrate to the cloud. Masking data on premises prior to uploading it to the cloud reduces risk for organizations concerned with cloud-based data disclosure.
How is SDM Applied?
There is no substitute for the subtleties and nuances of data that has evolved and grown through normal usage of an application. (The alternative is synthetic data generation, which is uniform and lacking in realism because it hasn’t been created through years of usage.) SDM therefore, starts with the original, production data and applies a series of data transformations to produce high-fidelity masked data. There are varying approaches, but this is what it typically looks like (see Figure 1):
Figure 1: High-Level SDM Architecture
A copy of production data is used to create a golden masked copy of the database, which is then replicated to the various environments. Also, captured in this diagram is the notion of multiple copies of production data that static masking helps protect. Often overlooked is the fact that for many organizations there is no (or minimal) masking applied before replication to less secure environments.
As depicted below, SDM changes sensitive data in a realistic manner (see Figure 2). Notice how names, birth dates and SSNs on the left are changed, but still look realistic in the masked example on the right.
Figure 2: Data Masking Example – Before and After
Advantages of SDM
- Sensitive data is permanently removed because the data transformations are applied to the data store. If an attacker compromises a statically masked database, the sensitive data simply isn’t there.
- No per transaction performance penalty. All data transformations are applied up front so that there is no performance impact once the masked database is made available to the various functions.
- Protects copies of production data in a wide range of scenarios including access via applications and back-end native queries.
- Greatly simplifies security of copy data. There’s no need to implement fine-grained object-level security because all sensitive data has been replaced.
Disadvantages of SDM
- Masking is applied to a data store via a batch process (not real time) that may take minutes or hours to complete depending on the size of the data.
- It cannot be used to protect the production database because it permanently alters the underlying data. As described above, it operates against copies of production databases.
Dynamic Data Masking
DDM is a relative newcomer to the data masking space and the manner in which it dynamically applies changes to data in-transit is where it gets its name.
Why Use Dynamic Data Masking?
DDM is primarily used to apply role-based (object-level) security for databases/applications. In practice, the complexities involved in preventing masked data from being written back to the database essentially mean DDM should only be applied in read-only contexts such as reporting or customer service inquiry functions. DDM is sometimes viewed as a means to apply role-based security to (legacy) applications that don’t have a built-in, role-based security model or to enforce separation of duties regarding access. However, there are limitations to this usage as noted below.
How is DDM Applied?
There are a number of different approaches taken to implement DDM including database and web proxies. RDBMS vendors are also beginning to offer DDM directly within the database engine. The database proxy approach shown below usually works by modifying SQL queries, but can also modify query result sets (see Figure 3):
Figure 3: High-Level DDM SQL Proxy Architecture
The sensitive data remains within in the reporting database that is queried by an analyst. All SQL issued by the analyst passes through the DB proxy which inspects each packet to determine which user is attempting to access which database objects. The SQL is then modified by the proxy before being issued to the database so that masked data is returned via the proxy to the analyst. In other words, a query like the one below that retrieves SSNs from the database:
select SSN from pers_data_tbl
Gets modified to be something like the query below, which instead of returning a list of SSNs, returns the last four digits of SSN with the leading six digits redacted with X’s:
SELECT concat('XXX-XX-', substring(SSN,6,4)) from pers_data_tbl
Visually, these queries would produce something similar to the following, keeping in mind that the SSNs stored in the database are not changed:
|SSN (Unmasked)||SSN (Masked)|
Advantages of DDM
- Adds an additional layer of security and privacy control to protect sensitive data.
- Protects data in read-only (reporting) scenarios.
- Works in near real-time.
- Does not require up front batch processing to mask all data in advance.
Disadvantages of DDM
- Not well suited for use in a dynamic (read/write) environment such as an enterprise application because masked data could be written back to the database, corrupting the data.
- Performance overhead associated with inspecting all traffic destined for the database.
- Detailed mapping of applications, users, database objects and access rights are required to configure masking rules. Maintaining this matrix of configuration data requires significant effort.
- The proxy is a single point of failure and can be bypassed by users connecting directly to the database potentially exposing the original data stored in the database.
- Organizations may be hesitant to adopt DDM if there is a risk of corruption or adverse production performance impacts. In addition, relative to SDM, DDM is a less mature technology for which customer success stories are not as well known and use cases are still being defined.
To end where I began, data masking is a must-have data protection technology that has been commercially available for over a decade. Static data masking in particular has evolved from stand-alone, single database point solutions to become integrated components in broader data management and data security offerings. It is one of the best ways, if not the best way, to protect copy data particularly when that data is used for secondary purposes such as application development and testing, training, analytics, etc. Alternatives such as synthetic data generation exist, but cover a much narrower set of use cases such as when original data sources are not robust or readily available.
Dynamic data masking, as a relative newcomer is not as broadly applicable although it continues to evolve rapidly. It is currently best suited to read-only scenarios, to avoid corrupting databases by inadvertently writing masked data back to data stores. Additionally, DDM may be perceived as an easy way to apply role-based security for applications but the read/write restriction coupled with the rule configuration complexity makes ongoing rule management a burdensome task. Alternatives to DDM may include database/application firewalls, blocking, etc. that prevent unwanted access to sensitive data using methods other than SQL rewriting.