Summary: Open table formats like Iceberg and Delta Lake make it easy for multiple compute engines to share the same data, but that flexibility quietly undermines platform-specific security controls like masking policies. The fix is format-preserving encryption applied at the data layer, before it’s written into the open table format, so protection travels with the data regardless of which engine accesses it.
If you’ve been to any data conference this year, you’ve definitely heard about open table formats. Apache Iceberg and Delta Lake are everywhere right now. Snowflake is talking about them. Databricks is talking about them. Everyone on LinkedIn is talking about them. And for good reason. These formats are genuinely changing how organizations structure and access their data.
But there’s something that nobody talks about that’s really frustrating to me: open table formats also create a security gap that most teams haven’t thought through yet.
In true Chris Struttmann form, let’s break down the problem, solution and outcome.
What Are Open Table Formats?
Strip away the hype, the slides, and the lasers that go off everytime someone on stage says the words “Generally Available” and an open table format is really just a standardized way of organizing a collection of data so they behave like a database table. Instead of defining how individual records are structured at the byte level, open table formats work at a higher level, tracking things like schema, partitioning, file locations, and snapshots across potentially thousands of underlying files. Any engine that supports the format can read and query the table, and many can write to it as well, allowing multiple data platforms to work with the same data without clumsy connectors or bulky and delayed import processes.
Think of it like a shared index. The data itself typically lives in columnar files like Parquet, and the open table format sits on top, managing all the metadata that makes those files appear as one logical table.
The two big names here are Iceberg and Delta Lake. Both are open source and vendor-neutral(-ish, looking at you Databricks), but in practice, Iceberg tends to be more closely associated with Snowflake, while Delta Lake is more aligned with Databricks.
Why Everyone’s Suddenly Excited
The practical advantage and what the speakers at the conferences want you to hear is so clear: instead of their customers manually loading data into a specific platform like Snowflake, they can now just drop their data into a central location, usually something like an S3 bucket, already structured in an open table format. Then they bring in a compute engine, point it at that bucket, and start querying.
This works through technical catalogs, like Polaris for Snowflake or Unity Catalog for Databricks. These catalogs manage the relationship between a logical table (think “customers”) and the actual files sitting in that bucket. When someone runs a query like “select * from customers,” the catalog knows how to resolve “customers” to the files sitting in S3.
It’s flexible, it’s efficient, and it lets multiple compute engines share the same underlying data without duplicating it. No wonder it’s all anyone wants to talk about.
The Trap: Security Has Been Depositioned
Here’s the part that ALWAYS gets glossed over in the keynote talks. Snowflake, Databricks, AWS Athena, and other engines that support the format may be able to access those files directly, depending on how permissions and governance controls are configured.
That sounds great for flexibility. It is an absolute betrayal to security. Let’s entertain a hypothetical: say you set up a masking policy in Snowflake to protect sensitive fields in one of these tables. That policy only applies when Snowflake is the one accessing the data! If the data isn’t actually living inside Snowflake (and with open table formats, it usually isn’t), there is no longer a guarantee that policy gets enforced at every point where the data is read or consumed. Someone accessing that same data through Databricks, Athena, or another engine may not be subject to the same Snowflake-specific masking policies, potentially exposing sensitive fields in ways you didn’t intend, or even know about!
In other words, your security policy is only as strong as the platform enforcing it. As more engines gain access to the same underlying data, maintaining consistent security controls across every access path becomes significantly more challenging.
Solution
Three words, one acronym: format-preserving encryption (FPE). If sensitive data is encrypted with FPE at the moment it’s written into the open table format, it stays protected no matter where or how the data is accessed later.
Why is that important? Encrypted data looks like gibberish (save for the FP aspects of FPE) unless you have the means to decrypt it. So even if Databricks, Snowflake, or Athena can technically read the file, they can’t make sense of the protected fields unless they’re set up and permissioned to decrypt them. That decryption capability has to be built into the system accessing the data, which means protection travels with the data itself rather than depending on which platform happens to be querying it.
Unlike traditional encryption approaches, FPE preserves the original format of the data. A credit card number still looks like a credit card number. A social security number retains its expected structure. That means analytics workflows, schemas, joins, and downstream applications can continue to function without extensive redesign while sensitive values remain protected. This prevents issues with data quality, application consumers and other format-specific restrictions.
This is the part that really matters for anyone adopting open table formats. The format gives you flexibility and interoperability. FPE gives you the security and control that flexibility can otherwise undermine.
What This Means for Your Data Strategy
If your organization is exploring Iceberg, Delta Lake, or any open table format setup, I would encourage you to ask a few questions. Where exactly does your sensitive data live once it’s in this open format? What systems can access that storage layer? And if you’ve built masking or access policies in one platform, are you confident those policies hold up if the data is queried somewhere else?
For a lot of teams, many of whom are still digesting the firehose of information from these large conferences, these questions haven’t come up yet, mostly because the conversation around open table formats has focused almost entirely on performance, cost, and flexibility. Security tends to come up later, often after something is already in production.
One effective way to reduce this risk is to protect sensitive data before it’s written into the open table format. By applying FPE at the transformation or data layer, protection remains intact regardless of which engine ultimately accesses the table. It means your sensitive data stays protected as your architecture evolves, as new compute engines get added, and as more teams gain access to your data lake.
I never thought I’d say this but open table formats are here to stay. If anything, they’re going to become more central to how organizations manage data over the next few years. The teams that think about security at the data layer now will be in a much better position than those who wait until an audit or an incident forces the conversation.
