When talking to customers about data protection in Snowflake, a few things get a little mixed up with one another. Snowflake’s Tri-Secret Secure and masking are sometimes considered redundant with ALTR’s tokenization and format-preserving encryption (FPE) – or vice versa. What we’ll do in this piece is untangle the knots by clarifying what each of these is, when you would use each, and the advantages you have because you can choose which option to apply to each challenge you come across.
Snowflake’s Tri-Secret Secure is a built-in feature, and it requires that your Snowflake account is on the Business Critical Edition. Tri-Secret is a hybrid of the “bring your own key” (BYOK) and the “hold your own key” (HYOK) approaches to using customer-managed keys for the encryption of data at rest. [ProTip for the Snowflake docs: Tri-SecretSecure is essentially a brand name for the customer-managed keys approach, and if you read these docs understanding that, then these docs are a little clearer.] When you use customer-managed keys, there is often a choice between having to supply the key to the third party (Snowflake in this case) on an ongoing basis or only giving it when needed – BYOK and HYOK respectively. Snowflake effectively combines these approaches by having you provide an encrypted version of the key, which can only be decrypted when it calls back to your crucial management systems. So, you bring an encrypted version of the customer-managed key to Snowflake but hold the key that can decrypt it. Tri-Secret is used for the actual files that rest on disks in your chosen Snowflake cloud provider and is a transparent data encryption – meaning this encryption doesn’t require a user to be aware of the encryption involved. It protects the files on disk without affecting anything at run time.
Snowflake’s Dynamic Data Masking is a very simple yet powerful feature. This feature requires Enterprise Edition (or higher). When a masking policy is used to protect a column in Snowflake, at run time, a decision is made to return either the contents of a column or a masked value (e.g., a set of “****” characters). You can apply this protection to a column either directly as a column policy or via a tag placed on a column associated with a tag-based policy. When you need to ensure that certain individuals can never see the legitimate values in a column, then Dynamic Data Masking is a perfect solution. The canonical example is ensuring that the database administrators can never see the values of sensitive information when performing administrative tasks. However, there are slightly more complex instances of hiding information where masking falls short. You can easily imagine a circumstance where users may be identifiable across many tables by values that are sensitive (e.g., credit card numbers, phone numbers, or government ID numbers). You want users doing large analytics work to be able to join these objects by the identifiers, but simultaneously, you’re obligated to protect the values of those identifiers in the process. Clearly, turning them into a series of “***” won’t do that job.
This is where ALTR’s Tokenization and Format-Preserving Encryption (FPE) enter the story. We could spend hours parsing out the debate about if tokenization is a super class of FPE, vice versa, or neither. There are people with strong arguments on every side of this. We’ll focus on the simpler questions of what each feature is, and when it is best applied. First, let’s define what they are:
– Tokenization replaces values with tokens in a deterministic way. This means that you can rely on the fact that if there is a value “12345” in a cell and it’s replaced by the token “notin” in one table, then if you encounter that value in another table, it will also be “notin” each time it started as “12345.” So now you can join the two tables by those cells and get the correct result. A key concept here is that the token (“notin” in this example) contains no data about the original values in any way. It is a simple token that you swap in and out.
– Format-Preserving Encryption (FPE) is like tokenization since you’re also swapping values. However, the “tokens” in this case are created through an encryption process where the resulting value maintains both the information and its format. FPE might replace a phone number value of “(800) 416-4710″ with “(201)867-5309.” Like the tokens, that replacement will be consistent so one can use it in joins and other cross-object operations. Unlike the tokens, these values are in the same “format” (hence the name and the phone number token looking exactly like a different phone number), which means they will be usable in applications and other upstream operations without any code changes. In other words, FPE won’t break anything; it only protects information.
ALTR has both Tokenization and Format-Preserving Encryption solutions for Snowflake, which are cloud-native and immensely scalable. In other words, they can both keep up with the insane scale demands of Snowflake workloads. The application-friendly FPE often seems like the only solution you need at first glance. However, there are reasons for choosing to use only Tokenization or perhaps both Tokenization and FPE in combination. The most common reason for going Tokenization only is due to regulatory constraints. Since the ALTR Tokenization solution can be run in a separate PCI scope, it gives folks the power to leverage Snowflake for workloads that need PCI data without having to drag Snowflake as a whole into PCI auditing scope. The most common reason we see folks run both Tokenization and FPE together is to stick to a strict least-privilege model of access. Since Tokenization removes all the information about the data it protects, some will choose to tokenize data while it flows through pipelines into and out of Snowflake and transform it to FPE while inside Snowflake to get the most out of the data in the trusted data platform.
Hopefully, it’s clear by now that the answer to the question “Which one of these should I use?” is: it depends. If you’re already on Snowflake’s Business Critical Edition, then using Tri-Secret Secure seems like a no-brainer. The extra costs involved are nominal, and the extra protection afforded is substantial. The real questions come when applying Snowflake’s Dynamic Data Masking and either or ALTR’s Tokenization and Format Preserving Encryption (FPE). Masking is a great option for many administrative use cases. If you’re not concerned about the user being able to do cross-object operations like joins and need to hide the data from them, then masking is easily the best choice. The moment there is the need for joins or similar operations, then ALTR’s Tokenization and FPE are the right places to turn. Picking between them is mostly a matter of technical questions. If you have concerns about application compatibility with the protected data, then FPE is your choice. If you want to keep the protected data away from the data platform, then Tokenization is the best option since FPE runs natively in Snowflake. And there are clearly times when you may have workloads complex enough that all of these can be used in combination for the best results. You’ve got all the options you could ever need for Snowflake data protection. So now it’s time to get to work making your data safer than ever.