Search Blog for

Implementing Privacy in a Real World Application
: ,

Implementing Privacy in a Real World Application

Background

Whenever Personally Identifiable Information (PII) is involved, it is wise to encrypt it from the get go. Strong encryption coupled with need-to-know access is key to gaining the trust of your customers and protecting their privacy. And it’s often required by ever-evolving data privacy laws and regulations such as GDPR.

In building Imperva’s Account Takeover Protection, we wanted to show usernames of potentially compromised accounts. Often containing first names, last names, and age, usernames are therefore PII.

This post is a detailed account of how we built privacy into Account Takeover Protection, including technical solutions to the challenges of encryption across operating systems, services, JDK versions, export limitations, and programming languages. In today’s ecosystem, many teams have similar needs. If you’re in such a team, whether in the security space or not, you may find this post helpful. We learned a lot of new things about encryption, and so will you.

Requirements on Privacy & Encryption Design

Our requirements were as follows:

  1. Customers have a way to see their private data in cleartext. Insiders within Imperva with access to DBs and encryption keys (admins, engineers, etc.) cannot see it.
  2. PII is encrypted at the source. It is never stored unencrypted.
  3. Key rotation. Users must be able to replace the secret key needed to view the usernames, e.g., if the key is compromised.
  4. If users prefer not to set PII encryption, they can still see masked usernames.
  5. Like any other service which stores non-public data, the service is entrusted with the data, and must not leak the passwords/keys which protect it.
  6. Auxiliary services need to maintain PII-related statistics.

Environment and Assumptions

Imagine two services: Data source and Data sink.

Data source and Data sink

The source originates the PII, and the sink is an API service which returns the PII to the user via a web UI. Additional services might read the PII. Imagine, for example, an auxiliary service which simply counts unique usernames.

In our case, the data source is based on the V8 JavaScript engine, and the data sink is in Java, based on Oracle JDK 8. During development, our microservices need to run on Linux and MacOS, both with OracleJDK and OpenJDK. This becomes important later.

Design

We started simple – the classic solution for requirements (1) and (2) is for the data source to encrypt the PII using a public key and for the user/browser to hold the private key.

We thought the user could hold on to a PEM file containing the private key in the same way AWS EC2 works. However, customers said they prefer passwords to key files, since they already have flows to deal with passwords.

It’s possible to encode the private key in base64 and use the result as a very long password, but requirement (3) meant that we should be able to replace the password without replacing the private key.

Eventually, our approach was to encrypt the private key using a symmetric key generated from a user-supplied password

Here’s the full picture:

Encryption design

The data source encrypts the PII with the public key. When the user wants to see the PII, they enter a password, which is used to decrypt the private key which, in turn, is used to decrypt the PII.

  • The unencrypted private key is never persisted.
  • The password is never persisted, not even hashed.
  • The password can be changed without re-encrypting all the PII, by only re-encrypting the private key.
  • If the password is forgotten, the PII is lost.

Because neither the unencrypted private key nor the password are ever persisted, a DB leak would not compromise the encrypted data.

So, in order to access unencrypted PII, an insider would have to eavesdrop traffic on production nodes before they even get to our data source. That’s requirement (5), which is outside the scope of this post, but there are generally other compliance controls in place to mitigate this threat.

To address requirements (4) and (6), the data source also sends a hash of the usernames, plus a version of the encryption algorithm, with 0 being “only hashed username are available”.

Asymmetric Encryption Challenges

Our go-to asymmetric encryption is RSA:

Cipher cipher = Cipher.getInstance(“RSA”);
cipher.init(Cipher.DECRYPT_MODE, privateKey);
return cipher.doFinal(encrypted);

This cipher must be able to decrypt what our JavaScript data source encrypts. Our data source uses crypto-browserify

crypto.publicEncrypt(publicKey, inputBuf)

That turned out to be more challenging than anticipated. Let’s take a deeper dive.

Sending the Public Key to the Data Source

The simplest way to generate the asymmetric key was:

KeyPairGenerator keyPairGenerator = KeyPairGenerator.getInstance(“RSA”);
keyPairGenerator.initialize(2048);
keyPair = keyPairGenerator.genKeyPair();
PublicKey publicKey = KeyPair.getPublic();
PrivateKey privateKey = KeyPair.getPrivate();

But now we need to send the key to the Data Source. publicEncrypt() expected a PEM file, but what does our publicKey look like? It turns out that, on our JDKs, it’s always an X.509 key in DER format, although this isn’t guaranteed or documented anywhere.

So we needed to convert DER to PEM, which is not a functionality built into the JDK, but we didn’t want a whole new library for this. However, PEM is just base64 encoded DER with a header and a footer (example), so we encode the DER bytes with base64 and send that to the data source, which does:

const pemLineRegex = new RegExp('(.{64})');
const pemLines = base64Der.split(pemLineRegex).filter(x => x).join("\n");
const publicKey = `-----BEGIN PUBLIC KEY-----\n${pemLines}\n-----END PUBLIC KEY-----`

Encrypting in JavaScript, Decrypting in Java

If you watched carefully, you noticed that publicEncrypt() is pretty simplistic. It refused to work with our Java cipher, which failed with padding errors. We tried:

Cipher cipher = Cipher.getInstance(“RSA/None/NoPadding”);

That also didn’t work.

After some quality Stack Overflow time, It became apparent that publicEncrypt() uses certain parameters and that the cipher has to be initialized with exactly those parameters:

So we tried:

Cipher cipher = Cipher.getInstance(“RSA/ECB/OAEPWithSHA1AndMGF1Padding”);

and

Cipher cipher = Cipher.getInstance(“RSA/ECB/OAEPWithSHA1AndMGF1Padding”, “BC”);

The former, and other, similar variants worked on Oracle JDK but not on OpenJDK.
The latter uses Bouncy Castle (“BC” security provider), but Bouncy Castle refused to parse this transformation specification.

What did work across JDKs was:

Cipher cipher = Cipher.getInstance(“RSA”, “BC”);
cipher.init(Cipher.DECRYPT_MODE, privateKey, new OAEPParameterSpec("SHA-1", "MGF1", MGF1ParameterSpec.SHA1, PSource.PSpecified.DEFAULT));
return cipher.doFinal(encrypted);

Note that the default Java provider may be sensitive to the version of OpenSSL installed on the machine, if any.

Symmetric encryption and pitfalls

Now it’s time to encrypt the private key. We started our symmetric transformation with Blowfish:

SecretKeyFactory.getInstance(“Blowfish”)

However, this immediately failed the encryption strength requirements of our static analyzers, as did RC4. Naturally, we went to AES: “AES/GCM/NoPadding”, But our static analyzers required salt/initialization vector.

With those, we were using the password as the key. That, however, stopped working with some input lengths, because what we actually needed was to generate a constant-length key from the password.

Stack Overflow led us to this list (PBE = Password Based Encryption), of which we chose “PBEwithSHA256and256BitAES-CBC-BC”.

This one also uses Bouncy Castle, because we don’t want to depend on differences between Oracle JDK and OpenJDK:

PBEKeySpec keySpec = new PBEKeySpec(password.toCharArray(), salt, 1000);
SecretKey key = keyFactory.generateSecret(keySpec);

This algorithm worked on OpenJDK, but Oracle JDK failed with “Illegal key size” errors. After a lot of digging, experimenting, blood, sweat, and tears, we discovered that Oracle JDK abides by a legal policy that forbids 256-bit AES. Even tinkering with the relevant JAR files didn’t help.

Dropping to 128-bit AES did the trick:

SecretKeyFactory.getInstance(“PBEwithSHA256and128bitAES-CBC-BC”, “BC”)

Other Considerations

Library

We’ve built a Java library expressly designed to accommodate our requirements, with an additional requirement:

The Java object which represents an unencrypted private key, or a reference for this object, may not be leaked outside the library. In this way, no developer can persist the private key, even by mistake.

For this reason, the library also provides a streaming interface to decrypt multiple records. Without it, decrypting tens of thousands of records would require decrypting the private key for each record. With it, the private key is decrypted once, still without being exposed unencrypted outside the library.

Browser side

As security professionals, we know the pain of having to type passwords all the time. We didn’t want to impose that on our users.

If the password is correct, the data sink returns a cookie to the browser. The cookie contains the same password, encrypted with yet another secret. On subsequent calls, the browser sends the cookie. This gives the data sink a way to decrypt the private key without asking for a password.

Conclusion

Privacy by design in a mixed environment proved to be more challenging than expected, but being able to safely provide this value to our customers was totally worth it. If you need to build something similar, hopefully this post will help you walk through the challenge.

Looking forward, we’ll implement a “password lost” scenario, which generates a new public key for new usernames generated by the data source. If a new public key is generated, then the data sink will have a mix of PII encrypted with two different public keys. Ideally, we’d like to be able to delete PII encrypted with the old key, and not have the data sink spend expensive cycles trying to load and decrypt it.