Data Classification

Data classification tags data according to its type, sensitivity, and value to the organization if altered, stolen, or destroyed. It helps an organization understand the value of its data, determine whether the data is at risk, and implement controls to mitigate risks.  Data classification also helps an organization comply with relevant industry-specific regulatory mandates such as SOX, HIPAA, PCI DSS, and GDPR.

Data Classification Essentials

Successful data classification requires a basic understanding of the following concepts.

Data States. Data exists in one of three states—at rest, in process, or in transit. Regardless of state, data classified as confidential must remain confidential.

Data Format. Data can be either structured or unstructured. Structured data are usually human readable and can be indexed. Examples of structured data are database objects and spreadsheets. Unstructured data are usually not human readable or indexable. Examples of unstructured data are source code, documents, and binaries. Classifying structured data is less complex and time-consuming than classifying unstructured data.

Data Discovery. Classifying data requires knowing the location, volume, and context of data on premises, in the cloud, and in legacy databases. (See Data Discovery for more information.)

Data Sensitivity. Data is classified according to its sensitivity level—high, medium, or low.

  • High sensitivity data, if compromised or destroyed in an unauthorized transaction, would have a catastrophic impact on the organization or individual(s). High sensitivity data includes personal data, financial records, legal data, business data such as intellectual property, authentication data, etc.
  • Medium sensitivity data is for internal use only, but if compromised or destroyed, would not have a catastrophic impact on the organization or individual(s.) Examples of medium sensitivity data are emails and documents that do not include confidential data.
  • Low sensitivity data is for public use. Examples include press releases, marketing materials, website content.

Since the high, medium, and low labels are somewhat generic, a best practice is to use labels that make sense for your organization. Two widely-used models are shown below.

Sensitivity Model 1 Model 2
High Confidential Restricted
Medium Internal Use Only Sensitive
Low Public Unrestricted

Note: If a database, file, or other data resource includes data that can be classified at two different levels, it’s best to classify all the data at the higher level. For example, if a file includes both unrestricted and restricted data, classify the file as restricted.

Compliance Requirements. Data classification must comply with relevant regulatory and industry-specific mandates, which may require classification of different data attributes. For example, the Cloud Security Alliance (CSA) requires that data and data objects must include data type, jurisdiction of origin and domicile, context, legal constraints, sensitivity, etc. PCI DSS does not require origin or domicile tags.

Data Classification Process

  1. Execute a data discovery process to determine the location, volume, and context of data on premises, in the cloud, and in legacy databases.
  2. Define data classification policies.
  3. Execute data classification process.
  4. Implement enforcement technologies to protect classified data (e.g., user rights management, privileged user monitoring, sensitive data auditing, separation of duties, etc.

Learn how Imperva solutions can help with the data classification process.

You might be interested in:

Data Discovery

Data discovery is a process for identifying and providing visibility into the location, volume, and context of structured…

Learn More


The General Data Protection Regulation (GDPR) provides a single set of rules for protecting the personal data of…

Learn More
Live Chat Agents Unavailable