ALTR Blog

The latest trends and best practices related to data governance, protection, and privacy.
BLOG SPOTLIGHT

Data Security for Generative AI: Where Do We Even Begin?

Navigating the chaos of data security in the age of GenAI—let’s break down what needs to happen next.
Data Security for GenAI

Browse All

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

In today’s environment, you’re aware that data security and governance are critical to prevent data breaches that can put your company in jeopardy of being fined, sued, and possibly shut down. This kind of data security is especially important in cloud data platforms like Snowflake where lots of sensitive data can be consolidated into a single data pool. A surefire way to outsmart bad actors who attempt to compromise your data is through a “switcheroo” tactic otherwise known as Snowflake Data Tokenization. If you’re unfamiliar with it, then think of data tokenization (metaphorically) as valet service offered at an upscale restaurant or formal event that you might have attended. Think of your car keys that the attendant exchanged for the valet ticket you were given, as how data tokenization works. If someone were to steal your valet ticket, it would be useless to them because they aren’t the actual keys to access your car. The ticket only serves as a substitute for your keys and ‘marker’ to help the valet attendant (who has your keys) identify which car to return to you. 

This blog provides a high-level explanation of what Snowflake tokenization is, the benefits it offers, how Snowflake tokenization can be done manually, and the steps to follow if you use ALTR to automate the process. There are other ways to secure data including Snowflake data encryption, but Snowflake tokenization with ALTR provides many benefits encryption cannot.

What is Data Tokenization and Why is It Important?

Data tokenization is a process where an element of sensitive data (for example a social security number) is replaced by a value called a token. This technology adds an extra means of securing your sensitive data. Here’s the most simplistic way to describe Snowflake Tokenization. Your data is replaced with a ‘substitute’ of non-sensitive data —that has no value and serves as a marker —to map back to your sensitive data when someone queries it. The ‘substitute’ is in the form of a random character string that is a ‘token’. For example, your customer’s bank account number would be replaced with this ‘token’ to make it impossible for someone among your staff to make purchases with their information. As a result, this added safeguard will help your company minimize data breaches and remain compliant to data governance laws. 

Benefits of Data Tokenization 

  • Maximized security: Data tokenization substitutes your original data with a randomly generated token for increased security. If your tokens become compromised, then they are completely useless to bad actors and cannot be deconstructed to figure out. This helps to maintain your customer’s trust that their information will not be compromised.
  • Highly Operational: Tokenization offers determinism, which allows people to perform accurate analytics on the data in the cloud. If you provide a particular set of inputs, then you get the same outputs every time. Deterministic tokens enable you to perform SQL operations (such as joins or where clauses) without the need to detokenize the data. As a result, this protects consumer privacy without interrupting analyst operations.
  • Scalability and less overhead costs:  ALTR lowers your overhead costs by eliminating the need for your company to automatically scale to meet ever-changing compliance requirements. We take care of this for you through our highly scalable Vaulted Tokenization  solution that fits with Snowflake when you might need to tokenize or detokenize datasets that contain millions or billions of values at a time. 

Why Data Tokenization is Better than Encryption for Many Use Cases

Sometimes you might hear data tokenization and data encryption used interchangeably; however, while both technologies help to secure data, they are two different approaches to consider as part of your data security strategy. Tokenization replaces your sensitive data with a ‘token’ that cannot be deconstructed whereas encryption converts your data into a format (done by an encryption algorithm and key) that is impossible to read and understand. 

A benefit of data tokenization is that it can be more secure than encryption because a token represents a value without being a function or derivative of that value. Another benefit it offers is by being simpler to manage because there are no encryption keys to oversee. However, when deciding if you should use tokenization or encryption, consider your specific business needs. Due to the benefits that tokenization offers in today’s environment, businesses in different industries are using it for a wide array of reasons. A few examples are commerce transactions to accept credit and debit card payments, the sale and tracking of certain assets such as digital art that’s recorded on a blockchain platform, and the protection of personal health information.

How Snowflake Tokenization Works if You DIY

If you’re wondering how to tokenize your data manually inside of Snowflake, then the answer is, “You can’t.” 

Snowflake does not have native built in tokenization capabilities, but it can support custom tokenization through its external function and column level security features as long as you have the resources available to write the code needed to implement tokenization, storage, and detokenization.

Lets take a look at what that would entail:

Implementing a remote Snowflake tokenization service

1) First you will need to write and deploy a remote service that can handle tokenization, storage and detokenization.

This service will need to be implemented in Amazon Web Services, Microsoft Azure, or Google Cloud Platform depending on which of those cloud providers you chose for your Snowflake instance.

There is a significant amount of effort required in this step that will require not just programming expertise but also expertise in how to use the storage, compute and networking capabilities of your cloud provider.

Also Snowflake expects data passed to and received from external functions to be provided in a specific format. Thus you will need to invest time in understanding this format and how to architect a solution that optimizes the exchange of a large amount of data.

snowflake tokenization
Fig. 1 Remote Tokenization service

2) Next you will need to configure a gateway endpoint in your cloud provider to receive the HTTP requests and responses required by Snowflake for external functions.

This layer is also where you implement authentication to ensure that only valid requests from your Snowflake instance are processed.

snowflake tokenization
Fig. 2 Gateway Endpoint

3) After implementing your external tokenization function you will need to create two objects in your Snowflake instance.

One is a user defined function which will be called from within your SQL statements to tokenize / detokenize data.

And the other is an Integration object that holds the credentials allowing your snowflake instance to connect and make a call to the EFs implementation in your cloud providers environment. These two objects can be created using SQL.

snowflake tokenization
Fig 3. User defined external function
snowflake tokenization
Fig. 4 API Integration Object

4) After these three steps then you will be able to call your detokenize / tokenize function from your Snowflake client.

As you can see that's a lot of extra time and effort. Let's see how you can avoid that by utilizing ALTR's tokenization solution.

How Snowflake Tokenization Works Using ALTR 

NOTE: A tokenization API user is required to access our Vaulted Tokenization. Enterprise Plus customers can create tokenization API users on the API tab of the Applications page. 

Let's compare this with using ALTR. If you use ALTR and Snowflake together tokenization is much easier because ALTR has done all the implementation work for you.

To use tokenization in ALTR you only need to create the Snowflake Integration object that points to our service and define an External Function in your database. We provide a SQL script that does this work for you with just a single SQL command.

You will need to generate an API key and secret from the ALTR portal.

This key and secret value are inputs to the SQL script we provide. Just run the script to create a Snowflake Integration object that represents a connection to ALTR’s external tokenization/detokenization implementation in the cloud. This script also creates two external functions that use this service. One to tokenize data, and another to detokenize.


snowflake tokenization
Fig. 5 Create a Tokenization Key

Now we can look at tokenization in action.

As mentioned previously a best practice is to have sensitive values tokenized at rest in the database, preferably before they land in Snowflake. ALTR supports this through a library of open-source integrations to data movement tools like Matillion, Big ID and others.

If we run this query as an account admin we can see that we have two columns tokenized. The NAME and SSN columns. The tokens that you see here are the values on the disk within Snowflake in the cloud.

snowflake tokenization
Fig. 6 Data Tokenized in Snowflake 


When it comes to detokenizing we want to only detokenize a value “on the fly”, when the data is queried, and we only want to detokenize the value for roles that are allowed to see the values.

With ALTR tokenization we do this for you automatically.

If we run this next query with the DATA_SCIENTIST role then we will see the values are de-tokenized and we see the original sensitive values instead of tokens. This is because in the ALTR portal, we have allowed the Data Scientist role to see these values.

snowflake tokenization
Fig. 7 ALTR roles

snowflake tokenization
Fig. 8 Untokenized Data in Snowflake

If you use ALTR for tokenization you do not need to write any code or invest in developing a solution.

We can ensure your data is tokenized before it lands in Snowflake with our open source integrations for your ETL/ELT pipelines and we can automate detokenization to only users who you authorize through the ALTR portal.

ALTR Snowflake Tokenization Use Cases

Here are a couple of use case examples where ALTR’s automated data tokenization capability can benefit your business as it scales with Snowflake usage.

Use Case 1. Your new research company needs to conduct a clinical trial.

A pharmaceutical company wants your new research company to conduct clinical trials on their behalf. The personal identifiable data and research information from clinical study participants needs to be secure to stay in compliance with HIPAA laws and regulations.  ALTR’s Data Tokenization would be the ideal method to incorporate as part of your compliant data governance strategy. 

Use Case 2. Your new retail store needs to accept credit cards as a payment method.

Your newly launched store accepts credit cards as one of your methods of payment from shoppers. To remain compliant to Payment Card Industry (PCI) standards, you need to ensure that your customer’s credit card information is handled securely. Data Tokenization would help you save administrative overhead costs and satisfy the PCI standards while storing the data in Snowflake. 

Automate Snowflake Tokenization with ALTR

As your business collects and stores more sensitive data in Snowflake, it is critical that Snowflake data tokenization is included as part of your data governance strategy. ALTR helps you with this process by providing our Vaulted Tokenization, BYOK for Vaulted Tokenization, and other capabilities to leverage. 

Since Snowflake announced the general availability of Snowpark in November 2022, we've heard more and more ALTR customers express interest in utilizing it as part of their Snowflake platform. It provides developers with the capabilities to eliminate complexity and drive increased productivity by building applications and models, or even data pipelines, within one single data platform. We've done some validation and are happy to demonstrate that ALTR's policy enforcement carries over from Snowflake to Snowpark without a hitch.

What is Snowflake Snowpark?

It's essentially a separate execution environment within Snowflake where you can write data-intensive applications. You can use third-party dependencies, and you can process data with very complete programmatic capabilities. This execution environment runs next to the Snowflake SnowSQL interface. ALTR applies, automates, and enforces Snowflake's native data governance features (without requiring SQL) in the Snowflake environment so that when the data flows into Snowpark, the Snowpark environment gets the benefit of all those same policies and controls.  

All the powerful ALTR data governance and security capabilities, including access monitoring, query logs, role-based access controls, dynamic data masking, rate limits, real-time access alerts, and even tokenization, are carried over to the users and data in Snowpark.

Who uses Snowflake Snowpark?

Data scientists are the most common users. If you wanted to use historical data to make a prediction like, for example, using the last ten years of rainfall info to predict the next ten years, you might use something like a statistical or machine learning module. Rather than writing your own, you can pull an existing, already-written module into Snowpark as a dependency and just plumb Snowpark through to build your analysis model. This means the data scientist running these models and analyses in Snowpark can only leverage the data they have permission to access in Snowflake. This kind of protection over sensitive data so approved users can utilize it is critical to financial services organizations. They need to access very sensitive financial data to build models to identify and prevent fraud. Have you ever been contacted by your bank when on vacation in another country to confirm it's really you making the transaction? That's probably a data model identifying anomalous activity.  

How does ALTR benefit Snowflake Snowpark users?

It allows the business to place controls over sensitive data that can be used for data modeling activities. So, the data scientists no longer have to take a chunk of data into their own data silo, crunch on the data, spit out an answer. Through ALTR's single pane of glass, data admins will have total control over all data access in both Snowflake and Snowpark. This also allows data scientists to get the benefits of Snowpark's powerful data processing capabilities with all the Snowflake data they have permission to use – for a streamlined and secure experience.  

Our compatibility with Snowpark is another example of ALTR's goal of providing governance and security over data wherever it is, but it's just the beginning of our Snowpark journey. Keep your eyes out for more details coming at Snowflake Summit 2023 in June.  

Anywhere you want to or need to work with data, ALTR will be there.  

See how ALTR's policy applies to data accessed using Snowpark:

In today’s digital age, data governance is essential for organizations looking to maintain the security of their sensitive data while maximizing data productivity. With large and dynamic data environments, it can be challenging to implement an effective data governance strategy that protects data from threats and promotes business growth. That’s where Matillion comes in, with our integration with ALTR, the only automated data access control and security solution for governing and protecting sensitive data in the cloud.

In this blog, we’ll explore the benefits of this integration and how it enables organizations to manage and safeguard their data more effectively. We’ll dive into how it can help organizations automate data access control, increase data visibility, and build deep trust through compliant and reliable data for confident data-driven decision-making. With Matillion’s integration with ALTR, organizations can enhance their data governance initiatives without ever leaving the Matillion interface.

What is Matillion’s Integration with ALTR?  

Matillion’s integration with ALTR empowers organizations to manage and secure sensitive data assets and set centralized data security policies so they can be confident their data is protected at all times. It offers several key features to ensure comprehensive data governance, including automated, granular data access controls, real-time data monitoring and analytics, comprehensive audit trails, and secure tokenization.

How Matillion integrates with ALTR

Matillion’s integration with ALTR allows for the implementation of classification-based policies to control access to sensitive data and is user-friendly for multi-skilled teams with no coding required. With interactive Data Usage Heatmaps and Analytics Dashboards, organizations can track data usage and access by users, as well as monitor their data usage. Additionally, the integration further offers flexible data masking options for private information and provides auditable query logs to ensure privacy controls are working correctly.  

Secure Sensitive Data with Matillion’s ALTR Integration

Supercharge Your Governance Policy

By integrating with ALTR, Matillion offers data-driven organizations a competitive advantage.

  • Automate it

Turn policies into practice with automated data governance to manage risk and safeguard your bottom line. Easily control and secure sensitive data by using a central area to set policy, so organizations can proactively mitigate risk before it negatively affects them. ALTR’s shared job allows organizations to utilize ALTR’s data security and policy capabilities within Matillion, so data engineers can mask and tokenize data assets at the beginning of the data journey — even from the most privileged admin users — to protect highly sensitive data.

  • Increase Transparency

Improve observability to monitor and secure the data landscape. With Matillion’s integration with ALTR, organizations can quickly get visibility into their organization’s data usage. This helps break down silos and make previously hidden data visible so organizations can quickly spot abnormalities and reduce vulnerabilities. ALTR provides a detailed audit trail, including who accessed the data, when they accessed it, and what they did with it. This information is essential for regulatory compliance and helps organizations detect and investigate any suspicious activity, so they can visualize and understand their entire data landscape to secure it at scale.  

  • Make Confident & Compliant Decisions

Matillion’s integration with ALTR makes it simple to enforce data compliance around the clock by providing granular access controls, detailed audit trails, and data protection measures. Abiding by data regulations and laws helps safeguard data and establish standards for its access, use, and integrity while ensuring the entire organization is compliant when working within its own teams and cross-functionally. Matillion’s integration with ALTR uses real-time insights into data access and usage, enabling organizations to make informed and confident data-driven decisions based on accurate, up-to-date information.

Automated Data Access Control & Scalable Security Increases Data Productivity

Matillion’s integration with ALTR is designed to automate data access control and provide scalable security to help organizations increase their data productivity, security, and ultimately, revenue. This integration is a powerful tool for data-driven organizations looking to supercharge their governance policy, increase transparency, and make confident decisions.

Maybe you're just getting started with Snowflake, maybe you're well into your Snowflake project but are running into the "sensitive data roadblock," or maybe (and we won't tell your security team) you already have all your data (including that sensitive customer PII) in Snowflake, ready to be used and optimized.

Regardless of your data project maturity, Snowflake data governance and security must be on your mind. And perhaps you're at different stages with this as well. You may be leveraging Snowflake's native data governance features to tackle some tasks with SQL but leaving others on the back burner. Or you find it difficult to keep up with all the new data coming in and the users requesting access.

Wherever you are in your journey, it's never too late to think about how you're managing Snowflake data governance and how you and your team can leverage data governance best practice to most efficiently ensure your data stays private and secure. We developed this Snowflake Data Governance Best Practices Guide to help you review your checklist and ensure your bases are covered.

Step 1: Data Classification

Data Classification within ALTR

An essential Snowflake best practice in your data governance program is to examine the data and databases coming into your cloud data warehouse to identify sensitive or regulated data. It may seem self-evident that a column labeled "Social Security Numbers" contains, well, social security numbers, but you might be surprised! Data can be accidentally comingled, sometimes column headers can follow a completely unintelligible formula, or you might be surprised to see email addresses in a column called "Username." If you have just a handful of columns or rows, digging through your data could be an hour's work in a morning. But if you have hundreds or thousands of columns, with new databases being continually added, this data classification task can become not just a time suck but practically impossible. That doesn't make it any less important, unfortunately. You can't govern or secure sensitive data if you don't know where it is.

See a comparison of how you might do this yourself using Snowflake's native capabilities versus ALTR's automated solution.

Step 2: Data Usage Monitoring

Data Usage Monitoring within ALTR 

Once you've identified (and hopefully tagged) columns holding your sensitive data, the next Snowflake best practice is to ensure that you have a way to see who is accessing that data, when and how much. Some companies have pushed so hard to become "data-driven" they might have opened up the data floodgates to the rest of the company clamoring for insights into their business units. While you can check data access manually with query logs in Snowflake, it can be an arduous task to turn that unstructured data into valuable insights. Having this visibility at your fingertips can make complying with data privacy regulations and audits much, much more manageable. And it can be incredibly insightful in allowing you to get a baseline sense of what normal data use looks like in your company. For example, are your marketing users accessing customer emails once a week for relevant outreach? Once you have that insight, setting appropriate policies and identifying anomalies becomes much easier.

Step 3: Data Access Controls and Policy Enforcement

Data Access Controls and Policy Enforcement within ALTR

This is the next critical Snowflake data governance best practice: deciding what roles should have access to what data and then enforcing that policy. Some groups need unfiltered access to the most sensitive data - think HR accessing payroll data. Other groups only need access to data that is relevant and critical to doing their jobs - the marketing team might need to cross reference purchase info with data of birth and email address to send a targeted offer. But the HR team doesn't really need access to customer PII. A helpful concept to follow is the "principle of least privilege" (PoLP). This is a risk-reduction strategy of giving a user or entity access only to the specific data, resources, and applications needed to complete a required task. Snowflake data governance, then, is all about setting these access controls by Snowflake database columns or rows.  

As more and more data is added to Snowflake and more and more users request access, the tasks of setting access controls for users can become both time-consuming and risky. The process becomes more onerous as additional Snowflake databases or even additional Snowflake accounts are added. Surely the roles, policies, and access controls need to be consistent across your whole Snowflake ecosystem.

See a comparison of how you might implement row-level security using Snowflake's native capabilities versus ALTR's automated solution.
Data Masking within ALTR

Step 4: Data Masking

A further refinement of the data access control best practice is data masking. This is the process of not completely excluding the data but obfuscating the data so it's recognizable. For example, an email address like contact@altr.com could be masked as c****t@a**r.com. Or a social security number could be shown as "***-**-1234. This allows users to run analyses on data in multiple databases by cross-referencing sensitive data like email addresses without knowing exactly what the email address is. Data masking is fundamental to Snowflake data governance programs.

See a comparison between writing data masking controls using SnowSQL in Snowflake versus automatically with ALTR.
And see how a multinational retailer used ALTR's custom masking policy to ensure the highest level of security for its customer PII data.

Step 5: Data Rate Limiting

The next and one of the most important Snowflake best practice is to limit the amount of data even an approved user can access. Even when data should be accessible to a specific group of users, it's improbable that they would need all the data at once. Can you imagine a marketing person downloading all the personal information - first name, last name, email, DOB, etc. - for every single customer? That sounds like a threat to me. In order to ensure that no users get carried away, intentionally or unintentionally, you should set up limits for the amount of data each role can access over a specific time period. This lowers the risk to your data by stopping credentialed access threats before they do unrecoverable damage and ensuring even the most privileged users don't access data they don't need.

See how you could set data access rate limits manually in Snowflake (or not) vs. automatically in ALTR.
Read how Redwood Logistics combined data rate limiting with alerting to ensure that privileged Snowflake admins couldn't access sensitive payroll data.
Data Rate Limiting within ALTR

Bonus: Business Intelligence User Governance

One of the primary purposes for migrating data to Snowflake is to enable analytics through business intelligence tools like Tableau and Looker. Once you have your data governed and secured in Snowflake, you'll want to make it available to line-of-business users throughout your organization. But how do you make sure you know who's accessing what data and that only authorized users get the sensitive stuff? You could create a Snowflake user for every Tableau user so that there's a one-to-one relationship, and Snowflake can track the individual's query. But this causes two issues: you have to manage Tableau and Snowflake accounts for every user, which can run into the thousands at the largest companies, and you have the same data monitoring issue listed above - you're digging through query logs.

See how ALTR's Tableau user governance integration can help avoid both these issues.

Snowflake Data Governance Best Practices

There are some data governance best practices you need to implement no matter where your data resides-on-premises or in the cloud. But there are some specific quirks to Snowflake you'll need to know in order to ensure your sensitive and regulated data stays secure while your users use it. Many of these can be accomplished using SnowSQL to activate Snowflake native features, but that's often not a feasible solution when managing multiple accounts, hundreds of databases, or thousands of users. ALTR's platform can help you scale your Snowflake data governance policies across your Snowflake ecosystem.

Level up your Snowflake Data Governance game with ALTR. See how easily you can get started in Snowflake Partner Connect:

Here at the end of Q1 2023, I’ve had a chance to look back on not just two quarters as CEO but to our decision to focus on Snowflake two years ago. And what is becoming even more crystal clear after talking to companies large and small over that time is that while they’re moving massive amounts of data to the cloud, they’re not completely moving away from existing systems. Basically, businesses have data everywhere.  

We’re currently talking to a large enterprise with 17 to 20 different systems storing data: some are older legacy boxes while others are brand new micro-services. They have a lot of systems and they’re all different. What does this mean for data security? The right solution needs to support multiple and varying integration methodologies to meet data where it is and where it’s going. The end goal is for sensitive data to be classified and tokenized on ingest, travel around the business tokenized, and be de-tokenized when needed so it can be safely consumed downstream governed by logging, policy, and alerting.  

And what does that mean for ALTR?  

  • Centralized SaaS solution: Businesses need a solution like ALTR that can be a hub or central spot from which data controls and security expand out as spokes to all the places data lives both in the cloud and on-premises. We have that with our SaaS platform and satellite Client Database Manager (CDM) on-prem components. That’s the future of data security.  
  • Focused on structured data: From the beginning, we focused on structured data where most of the sensitive and regulated data reside. We’ve been investing in this area from the beginning – we understand latency in a way that many other solutions don’t.  
  • Build on data governance and security fundamentals: Data classification and tagging, automated policy enforcement with dynamic data masking, patented rate limits and alerting, and tokenization. These are the building blocks of data security we excel at.  
  • Expand control and security via a library of integrations: We’ve done an excellent job of this already with our database drivers, network proxies and open-source integrations, but we need to do expand further into other data repositories like S3, Hadoop, Databricks – all the places data proliferates.

Snowflake vs Databricks

Yes, that’s a clickbait subhead. As I said, we’ve been extremely focused on Snowflake since February 2021 when we announced availability of the market's first native cloud platform delivering observability, governance, and protection of sensitive data on Snowflake. But the truth is that a number of businesses have a use for both Snowflake and Databricks and so far, they’ve really served two different purposes:  

  • Databricks has traditionally been very data science focused, allowing data scientists to share their work, especially code blocks, very easily. 
  • Snowflake made its name by being very analytics-focused, allowing data users to load as much data as they want and share deep insights via analytics and dashboards. However, Snowflake is expanding into other data solution areas with tools such as Snowpark which enable data science workloads and applications.

Because the work use cases are primarily different as of now, the use cases for data governance and security are different as well.  

In Snowflake, we see a focus on masking sensitive data and role-based access controls to keep it from users who don’t need the full information – like a data analyst using Snowflake or a Tableau report who can do what they need to with masked email addresses or masked phone numbers.  

On Databricks, data scientists need full access to plain text production data so they can fully understand the distribution of the data. This means the focus is more about privacy protection and breach prevention. We see data scientists create multiple copies of the exact same data set with a light format variation. The model may produce slightly different results from the same data. That makes data governance and security more complicated because you’re basically playing cat and mouse with data constantly being copied, moved and shared – without oversight. So, we expect the solution will be focused on sensitive data discovery and classification, then automatic access logging and reporting – a kind of “overwatch” mode with tokenization underlying all of it.  

ALTR’s feature set can be applied to both to solve their specific governance needs. Because we have that SaaS hub, we can now just pull the capabilities and controls we need down to wherever policy enforcement is needed.

ALTR’s Database Connections and Integration Methods

Your Single Source for Data Access Truth

So, what’s ALTR focus for the rest of this year? Building out the remaining connections to all the different data sources/data stores. Because even though the data governance and security use cases might be different across different data platforms and systems, companies don’t want to add, implement and manage different vendors. No one has time for that. They’re looking for one solution to the entire data security knot – one pane of glass, one single source for data access truth.  

ALTR will make this happen this year. And if you’re interested in working with us on Databricks, Amazon Redshift, BigQuery or any other database or solution integrations, get in touch with us to help drive the features we implement and roll out. We can’t wait.  

Want to try ALTR today? Sign up for our Free Plan. Or get a Demo to see how ALTR’s enterprise-ready solution can handle securing your data wherever it is.  

Data catalogs are an essential component of your broader data strategy. Modern data catalogs, “Make it easier for your analysts to find, understand and trust the data” within your database. Your data catalog is essential for synthesizing information about your data and making that information available across your company but isn’t the actual data. Your catalog may give users the ability to search and index meta information about your data, but it won’t go all the way to providing access control and security on your data.

This past quarter, many of our partners shared their guidance on data catalogs and the value they add to your data solution. We checked in on a what a few of our partners were saying and compiled the best in thought leadership for you here.

Alation: How to Get Immediate Value from Your Data Catalog

This blog post, written by GT Volpe, describes factors that will help your organization see immediate value in your data catalog. Volpe begins this piece by defining what it means for an organization to be data-driven and how data catalogs fit seamlessly in the push to be a data driven organization. Volpe shares considerations that you should think through prior to applying a data catalog, including focusing on planned outcomes and user skills, considering culture, industry, and regional needs, and embracing scalability and expansion.

Volpe talks of the importance of ensuring that your data catalog works for you and will continue to work for you as you scale, and as priorities may change. He writes of the importance of seeing immediate value from your data catalog, and that the one way to ensure success and accelerate time to value is to implement a data catalog plus data governance solution from day one – he writes that with the combination of a good data governance solution and a good data catalog, you will see immediate business value.

Alation: What Is the True Value of a Data Catalog?

Aaron Bradshaw, Data Governance & Enablement Specialist - Solutions Engineer at Alation, wrote about the specific value adds of implementing a data catalog. Bradshaw explains the reasons one may consider implementing a data catalog and what each implementor may expect from their catalog. He goes on to describe the difference between offensive and defensive approaches to data strategy and how that may play part in the decisions your organization makes surrounding investment and implementation of your data strategy.

Bradshaw concludes this blog post by outlining the strategic decision to implement a data catalog by quantifying its monetary value. Through a study done in partnership with Forrester, Alation discovered that “Adopting data catalogs has both quantitative and qualitative advantages, including a 364% return on investment (ROI).”

Collibra: Data Catalogs, Data Governance, and the Journey to Data Intelligence

This blog post, written by Paul Ewasuik, Director of Cloud Partnerships at Collibra, takes readers through the key points of a data governance plus data catalog solution. This post breaks down the often-intimidating components of data governance and data catalogs and turns it into understandable and digestible actionable items for your organization to follow. Collibra mentions a critical component of successfully implementing a data catalog:your data catalog won’t be complete unless you partner it with a data governance solution, like ALTR. “Data governance allows your data citizens — and that’s everyone in your organization — to create value from data assets.”

Collibra continues on by explaining that data catalogs and data governance go hand in hand, but you won’t have complete data protection and accessibility without one or the other. They mention that “A data catalog creates and maintains an inventory of an enterprise’s data assets across its entire digital environment… A data catalog provides a reliable solution for the discovery, description, and organization of data sets,” while, “Data governance is the practice of managing and organizing data and processes to enable collaboration and compliant access to data.”

Snowflake: How Implementing a Data Catalog Optimizes Your Snowflake Data Cloud Migration

Juan Sequeda, Principal Scientist at data.world, a Snowflake partner, wrote about how you can optimize your Snowflake data cloud migration with a data cloud. Sequeda explains that often, migration to Snowflake’s Data Cloud is done without intention or thoughtfulness, leaving many people to fall into a “lift and shift” approach, meaning they sloppily copy all data to the cloud, likely including errors and messy data. The solution to this, Sequeda explains, lies in implementing a data catalog — empowering your organization to have a better inventory of your data before, during, and after migration to the cloud.

Sequeda explains the key differences of data catalogs and describing a few key capabilities you should be mindful of when choosing a data catalog. A unique topic that this blog expands on is the intersection of high-value and low-complexity data. Sequeda writes that the sweet spot for your cloud data migration lies in figuring out, “what data is most important, what data is of the highest business value, and what data sees the most use.”

Conclusion

There is no doubt that data catalogs are beneficial, but they are just the first step to a fully operational data governance solution, and our partners agree that the best implementation of a data catalog is side-by-side with a scalable, simple, SaaS based data governance solution.

ALTR’s cloud data governance solution allows you to automatically discover, classify, and tag sensitive data with a checkbox in our no-code interface. We allow you to see what data is used, by whom, when, and how much with our industry-first interactive data usage heat maps and drill-down analytics dashboards. You’re able to control access to your sensitive data with classification-based policies so only approved users can see it, while quickly applying flexible, dynamic data masking over PII like social security numbers or email addresses to keep sensitive data private. It’s all part of ensuring data is governed securely from database to data catalog to data user.

Many of us have become more aware of the power of increased knowledge around our activities over the last few years – whether it’s a FitBit monitoring our steps or an energy audit delivering a detailed view of how everyone in your home uses lights, appliances, electronics, and other things that need power. Each month your utility company monitors your usage, and the details can help you recognize ways to lower your bill and identify current problems that are making your home less energy efficient. You can do the same with data usage observability. By capturing and monitoring who is running queries on data and when, you can make informed decisions to prevent data breaches and leaks. ALTR Heat Maps and Analytics Dashboards can make this information easily viewable and digestible so you can see where the issues might be. 

This blog provides a high-level explanation of what data observability is and why it’s important, how it works if you do it manually in Snowflake, and how it works if you use ALTR to automate the process.  We’ve also included a few use cases and a how-to demonstration video. We hope you find it helpful! 

What is Data Observability and Why is It Important?  

At a high level, data observability is presenting information about how users are accessing data in an easy-to-consume visual format. Operationalizing this through data observability tools is critical to helping you understand what’s occurring and to gauging abnormal events. The payoff for using ALTR's data usage observability tools is that they provide the information needed to meet two key data security policy goals:

  • Ensure that you have policies in place for all roles who access sensitive data 
  • Help you understand what normal access looks like because you can't identify what is abnormal without a baseline to compare to 

Snowflake logs capture this access information, and the events shown can help your Data Security team spot issues and minimize time spent on bottlenecks, speed, or other problems. But this information is delivered in a plain text format that requires a lot of work to extract those insights from.  

How Snowflake Data Observability Works if you DIY 

Snowflake provides the foundational query history data needed for data usage analytics via Snowflake logs; however, to be useful, the data must be processed to get it in a visual form that is easy to interpret. 

To do data observability manually in Snowflake, you must follow the steps below.

  1. Parse the SQL query text to extract a list of columns that you’d like to request and then filter it to only include columns that contain sensitive data.  NOTE: This will require you to write SQL statements.
  2. Next, tabulate the count of records that each user has accessed each column for each minute, hour, and day of the past 24 hours. NOTE: This will require even more SQL coding.
  3. Last, convert the data set into an interactive visual chart that will display the information in a more understandable format to view the results and drill through them. NOTE: This will require full stack development skills to implement.
data observability
Figure 1.  SnowflakeQuery History on the left and Query Strings in SQL that’s unstructured and must be parsed

As you can see from the steps that are required, data observability done manually will require more time and lots of coding. 

How Snowflake Data Observability Works Using ALTR to Automate  

ALTR's Dashboard provides a high-level view of everything that’s happening in ALTR. For example, it will show you how many locks and thresholds you have, open anomalies, databases that are connected, columns that are governed, and other detailed data.

data observability
Figure 2. ALTR Dashboard

The built-in ALTR Heatmap (also what we refer to as ‘Data Analytics’ or ‘Query Analytics’) delivers data observability in a visual representation of how your users are accessing data. It shows you the roles and specific users in those roles who are querying different data sets. The analytics will give insight about how your data is being used to help you identify where you need to assign policy or if you’ve already assigned policy to confirm how people are querying data within those policies. 

data observability
Figure 3. ALTR Analytics that has a drill-down option to see a breakdown of analytics per user

When you hover over the heatmap (as shown in figure 4) you can see the total number of values accessed by your assigned user groups in the columns you’re governing. You also can drill down and view a more granular level of what data is accessed by your specific users, user groups, and data types.

data observability
Figure 4. Data Usage Heatmap example that shows the number of values queried by each user

If you check Add Data Usage Analytics when connecting your data source, ALTR will import the past 30 days of Snowflake's history to show you your organization's usage trends. From there, your query history will automatically sync daily on all columns in your connected database. See figure 3 for context. 

Snowflake Data Observability Use Cases

Here are a couple of use case examples where ALTR’s automated data observability capability can benefit your business as it scales with Snowflake usage. 

Use Case 1. You want to determine a typical consumption pattern and restrict access to no more data than is normal for the user’s role.

You’re an Administrator who has already created policies on your different column-level data but want to determine if you should create an additional one to your Credit Card column. You could view the data usage over the last 7 or 30 days to see what a typical consumption pattern is and then decide what to set your time or access-based threshold to.

Use Case 2. You want to determine if anything looks strange that may require action and ensure all roles accessing data have policies that cover them.

You’re a Data Scientist and want to confirm that the right user groups are accessing column-level data that you’ve created policies for. If anything looks strange (for example, certain roles are querying data on the weekends instead of your business hours) then you can determine if you need to block access or trigger an alert (anomaly) to protect your data. 

Automate Snowflake Data Observability with ALTR

By operationalizing metrics through ALTR’s data observability tools , you can minimize data breaches and make informed decisions to stay in compliance with SLAs and certain regulations. The detailed data that the ALTR Dashboard and Analytics (Heatmap) provides is a must-have for an effective data security strategy. We’ve made using ALTR to view everything that’s going on within it and your analytics so convenient that you don’t have to write any SQL code.  It’s a simple point-and-click in ALTR and you’re done!

See how you would get data observability by doing it yourself in Snowflake vs automatically in ALTR: 

In a study performed ahead of last week’s Gartner Data and Analytics Conference, researchers found that data governance is a top initiative that business leaders and data officers plan to focus on in 2023 and into 2024. We’re glad to see companies choosing this as a top priority. Data governance is only increasing in urgency and demand, yet we’ve seen many organizations falling behind in establishing a proper data governance practice.  

Data Governance Definition

At its center, data governance is, “the process of overseeing the integrity, security, usability, and availability of data within an enterprise so that business goals and regulatory requirements can be met.”

According to the Gartner Glossary, data governance is, “the specification of decision rights and accountability framework to ensure the appropriate behavior in the valuation, creation, consumption, and control of data and analytics.” A well-run data governance practice can improve your data quality by setting the standard for how data is received, stored, and processed within your database.

A successful data governance strategy will guarantee that your data is dependable and that data users are being held accountable for their access levels. Data governance is a critical business component for all industries and will only continue to increase in traction as a standard business practice as more rules and regulations are set in motion surrounding data governance.  

Building a Successful Data Governance Strategy

Building a data governance strategy ahead of implementing a data governance solution is the best way to ensure your data plan is set up in such a way that will offer success for the data you store. Your data governance strategy serves as the base for the decisions that your organization makes. Once you have that strategy in place you can better identify which data governance solution best fits your needs. A solid data governance strategy will guarantee that you are building a solid foundation for your data.

Our partner, Alation, says that a successful data governance strategy must decide four things: data availability, data consistency, data accuracy, and data security.

Data Availability: A good data governance strategy allows the correct data to be available to the correct people at the appropriate times. Your data governance solution should be structured in such a way that the people who the data is made available to can easily find and access the data. When strategizing, your team should determine what data availability you will need within a data governance solution.

Data Consistency: One key point your organization should consider when discussing your data governance strategy is the standardization of data points across your database. Determining these key data points from the beginning will help streamline the decision-making process down the road and will ensure consistent decisions are being made across your organization regarding your data.

Data Accuracy: Determining how data comes into your database, what you do with it once it’s there, and how data will exit your database will establish ground rules for the future of how your data is managed. It’s important to determine ahead of time the values, tags, and lifecycle that will be associated with data points to ensure consistency and accuracy, and guarantee that your dataset is error-free.

Data Security: Companies are responsible for protecting the sensitive data entrusted to them. Should your organization need to pass regulatory audits for any reason, a good data governance strategy and solution will ensure your data is safe, and that you have the audit logs available confirm regulatory compliance.

6 Must-Haves to Master Data Governance

A quick Google search of “Data Governance Solutions” will prove there isn’t a one size fits all approach to data governance tools. By taking the time to pre-determine your data governance strategy, you are better positioned to find a data governance solution that is flexible and scalable to fit your needs.

ALTR’s data governance solution delivers both scalability and flexibility, and we’ve seen data governance success in organizations from a multinational retailer governing PII to a regional credit union protecting PCI to a unique healthcare organization safeguarding PHI.

Our data governance features matrix outlines 18 key points that we think are critical when evaluating whether a data governance solution will suit your needs. We’ve broken down key differences that ALTR brings to the table in each of these categories:

1) Flexible Pricing

Starting price and scalable pricing are points to consider when choosing a data governance solution. While some data governance solutions charge six figures to start, ALTR is proud to offer a free solution for one database within Snowflake and scalable pricing when you decide to upgrade to ourEnterprise or Enterprise Plus plans.

2) Access Monitoring

See what data is used, by whom, when, and how frequently with ALTR’s industry-first interactive Data Usage Heatmaps and drill-down Analytics Dashboards. Access monitoring is helpful for understanding normal data usage, identifying abnormalities, and preparing your organization for an audit request.

3) Data Masking

Within ALTR, you can quickly and seamlessly apply flexible, dynamic data masking over PII like social security numbers oremail addresses to keep sensitive data private.

4) Rate Limiting

ALTR’s patented, rate-limiting data access threshold technology can send real-time alerts, and slow or stop data access on out-of-normal requests. The control is then in your hands to stop individual access when it exceeds normal limits. This is helpful for mitigating the risks associated with credentialed access and data breaches.

5) Tokenization

Tokenization is the process of replacing actual sensitive data elements with random and non-sensitive data elements (tokens) that have no exploitable value, allowing for a heightened level of data security during the data migration process.  ALTR’s tokenization allows you access to secure-yet-operable data protection with our patented data tokenization-as-a-service. 

6) Open-Source Ecosystem Connectors

As we worked to integrate our solution with data discovery, catalog, and ETL tools, we found opportunities for very simple data governance integrations to be created. We found open source to be an ideal distribution method as it allows companies more flexibility in building their integrated and secure data stacks.  

What’s Next for Data Governance

The need for data governance is guaranteed to only increase in the coming years. IDC predicts that the global datasphere will double in size from 2022 to 2026. We predict that in the near future, companies that don’t have a data governance strategy in place now, will soon need to.

From legacy on-premises providers to new cloud-based start-ups to lateral players in the data ecosystem, it seems almost everyone offers a “data governance” solution these days. But, there’s no actual data governance without data control and protection, and DIY and manual approaches can only take you so far. To ensure they don’t fall behind, companies should be evaluating data governance solutions now to find those that meet the requirements of their data governance strategy and deliver those six “must-haves” for mastering data governance.

Sometimes it seems like data governance and security is everyone’s and no one’s job. When that’s the case, there can be cracks in your data governance and security posture, and that can open the door to data risk. One way to overcome this is to ensure that the entire company is supportive of the initiative. But how?  

As part of our Expert Panel Series, we asked some experts in the modern data ecosystem how data governance and security leaders can better engage with the rest of the organization.

Here’s what we heard….

“Data governance is not a joyful exercise."

"If there is data governance needed, there is a pain, threat or a business opportunity..."

"Implementing data governance requires changes in processes, roles and responsibilities ... and by definitions humans are reluctant to change..."

"So my key advice to data governance and security leaders is to emphasize the WHY data governance is a must, why it is needed by the organization and make sure you have a strong change management plan in place, not just tools to roll out!”
- Damien Van Steenberge, Managing Partner, Codex Consulting
“I think the best way for data governance leaders to engage the rest of the organization is to show them a world where it's easy for data consumers to access data and then show them the power and value of the data they're able to access. Show them how it makes their job easier, better, faster.”
- Chris Struttmann, Founder and CTO, ALTR
“Data governance and security initiatives can fall flat if they are perceived as top-down mandates that are out of touch with the work that’s being done. The best way to get buy-in from across the organization is to tie data governance and security initiatives with use case-based deployment. When stakeholders can see the positive impact and relevance of new technology or practices, they won’t just be more engaged – they will become champions.”
- Pat Dionne, Founder and CEO, Passerelle
“From an ETL point of view, it’s ensuring that data engineers are given the freedom to automate their pipelines which ensure they are adhering to governance policies. Too much process in this area can slow them down. They want to just run and know that it’s all going to fall into place.”
- John Bagnall, Sr Product Manager, Matillion

Watch out for the next installment of our Expert Panel Series on LinkedIn!  

With our recent open-source data governance integration initiative announcement, I wanted to take this opportunity to explain in a little more detail why ALTR decided to go down the path of open-source data governance integrations. One of our principles has always been that because data governance is so critical, it has to be easy and accessible across the data ecosystem. In fact, strong governance and security are a requirement for adding many workloads to the cloud...which has led to the necessity of many of the products in our ecosystem. Other vendors in the space, though, perhaps coming from a more traditional enterprise software mindset, are focusing on their own proprietary marketplaces for connectors or charging for custom integrations to connect various data tools. While this approach may work well for short-term bookings, it doesn’t serve the long-term customer mission to get the most value out of their data.  

Furthermore, that approach just doesn’t align with ALTR’s DNA. We built our SaaS platform to be accessible via the cloud, we removed the need for SQL authoring with our point and click interface, we built an incredibly powerful automation interface on top of that, and we introduced the first and only completely free data access control solution in the market. Doing the status quo just because it is the status quo is antithetical to the founding mission of ALTR.  

As we worked to integrate our solution with data discovery, catalog, and ETL tools, we found opportunities for very simple open-source data governance integrations to be created. Once we built a few of them, we started to identify that open source was an ideal distribution method.  

How open-source integrations help our customers:

  • Unbind the buying process: Free, open-source data governance connectors allow customers the flexibility to choose the solutions they want to implement on their own schedule and timeline, rather than having to research, select, and onboard the full stack at once as part of a consolidated purchase process. Customers can move at their own pace, choose the tools they prioritize for their budget, and add data access control and security when ready.  
  • Flexible implementation: Open-source integrations enable customers to implement their unique use case and configure the solutions in a way that works for their infrastructure, without custom code or manual implementation, rather than being bound by the limitations of a fixed integration delivered by a partner marketplace. This also allows users with resource constraints to do more with fewer solutions, optimizing their stacks for increased efficiency.  
  • Enterprise-level features for free: ALTR’s enterprise-ready integrations work with data catalogs, ETLs (Extract, Transform, Load), and other data ecosystem tools data customers already use. This increases the data access control features available to customers while decreasing the number of tools they’re required to manage.  
  • Community development and improvement: We’ve found over the years that almost all our customers look to solve the same problems repeatedly, so like any open-source initiative, we’re enabling end users to contribute their own solutions to the ALTR GitHub library. For example, if a user wanted to send ALTR data classification information into a specific field in a data catalog, they could build that feature and submit that back to the repository for others to benefit from. Because all the customers are solving the same problems, we’ve created an environment where peers across organizations can gain from the experience of others, which makes everyone’s job easier.  

ALTR’s open-source data governance integrations are available through our GitHub open source library.  

open source data governance

With this initiative, ALTR offers non-proprietary connectors to extend the powerful features we provide in the ALTR Free forever plan into leading partner stacks, including Alation and Matillion. These open-source integrations enable seamless data governance, with access control and security spanning from database to data catalog to ELT to cloud data platforms. Complexity is removed by merely plumbing together the already in-market solutions in our ecosystem. Nothing proprietary or complex—just simple and thoughtful connectors which bring ALTR’s value and feature set into the adjacent tools of our ecosystem.

Our end goal is to facilitate interoperability and remove barriers so that customers can build an integrated cloud data stack that allows data to flow freely and securely, and ultimately allows the customer to get more value from more data more quickly and with less resources

See how our open-source data governance integration works with Alation:  

See how our open-source data governance integration works with Matillion:  

Hear Chris and James explain why open source data governance integrations is the best approach:  

Try it yourself now with the ALTR Free plan

Determining whether a data lake or a data warehouse is the best fit for your organization’s data is likely one of the first in a long line of data-driven decisions you’ll make in your data governance journey. We’ve outlined four key differences between data lakes and data warehouses and explained factors that may impact your decision.

By definition, a data lake is a place where data can be stored in a raw and unstructured format. This data is accessible whenever and by whomever - by data scientists or line of business execs. On the other hand, a data warehouse stores structured data that has been organized and processed and allows the user to view data in digestible formats based on predefined data goals. Due to their nature, there are a few key differentiators between these two data storage options.

1) Data Format

First, the format in which data can be viewed after import varies between data lakes and data warehouses.

A data warehouse requires data to be processed and formatted upon import, which requires more work on the front end, but allows for more organized and digestible data to be viewed at any point in the data’s lifecycle after defining the schema. Data typically flows into data warehouses from multiple sources, and typically on a regular and consistent cadence. Once the data is collected in the warehouse, it is sorted based on pre-determined schemas that your data team sets.

Data lakes allows you to store data in its native or raw format the entire time the data is housed within the lake. This allows for a quick and scalable import process and allows for your organization to store a lot of data in one place and access the raw form at any point. Data lakes typically are optimized to store massive amounts of data from multiple sources, allowing your data to be unstructured, semi-structured, or structured.

2) Processing

The way in which data is processed is a critical differentiator between a data lake and a data warehouse.

Data warehouses use a process called schema-on-write and data lakes use a process called schema-on-read. A schema within data governance is a collection of objects within the database, such as, tables, views, and indexes.

Schema-on-write, what is used in data warehouses, allows the data scientist to develop the schema when writing, or importing, the data, so that the database objects, including tables and indexes can be viewed in a concise way once imported. This may mean more work on the front end writing SQL code and determining the objectives of your data warehouse, but will allow for a more digestible view of your data once imported.

On the other hand, schema-on-read allows execs to forego developing the schema when importing the data into the data lake but will require you to develop the schema when accessing the data later down the road. Schema-on-read is what allows your data to be stored in unstructured, semi-structured, or structured formats within your data lake.

3) Flexibility

The benefit of schema-on-read is allowing the schema to be created on a case-by-case basis to benefit the data set. Many who opt to store their data in a data lake prefer the flexibility that schema-on-read allows for each unique data set.

Alternatively, schema-on-write interprets all imported data equally and does not allow for variance once imported. The benefit of flexibility in a data warehouses is the ability to immediately see the impact of your data within the warehouse after import – you’ve already done the front end work of determining the schema and your data will be immediately accessible and readable for you.

4) Users

Finally, accessibility and user control may be the deciding factor for how and where your company stores data.

A data lake is more accessible by day-to-day business execs and makes it easy to add new raw data to your lake. A data lake is traditionally less expensive due to the nature of the format, and because you likely won’t need additional manpower to import and maintain your data within the lake. The nature of a data lake is such that data can regularly be added in its original format and the end outcome of the data can be determined down the road, at any point in the data’s lifecycle.

A data warehouse likely will only be accessible and able to be updated by data engineers within your organization. It is more complicated to update and may be more costly because of the manpower required to produce changes. When setting up your data warehouse, your data team will likely need context of what your data needs to do in order to correctly write the SQL code that will make your warehouse successful.

It's important to note that you can have a data warehouse without a data lake, but a data lake is not a direct replacement for a data warehouse and is often used to complement a data warehouse. Many companies who use a data lake will also have a data warehouse.

Regardless of where you store your data, you’ll need to set up access rules to govern and protect it. Implementing a cloud data security solution has never been easier.

“To rate limit with data usage thresholds or not to rate limit with data usage thresholds? That is the question.”

Even though this twist on the infamous line “To be, or not to be…” in William Shakespeare’s Hamlet is playful, protecting sensitive and regulated data from credentialed threats is very serious. We might all trust our data users, but even the most reliable employee’s credentials can be lost or stolen. The best approach is to assume that all credentials are compromised all the time.

So the question is not if you should put a limit on the amount of sensitive data that credentialed users can access, but how.

In this blog, I’ll explain what rate limiting is, how you can apply rate limiting in Snowflake, and how you’ll save time by automating data thresholds through ALTR. To reiterate how you will benefit from using ALTR to limit access and risk to data from credentialed access threats, this blog will also include a couple of use cases and a demonstration video.

What is Rate Limiting?

In a nutshell, rate limiting enables you to set a data threshold regarding specific user groups (roles) who can obtain sensitive column-level data based on an ‘access-based’ amount. For example, you might want to limit your company’s Comptroller to only query 1000 values per hour.

Another type of data threshold you could set is one that’s ‘time-based’. For example, if you want to limit access to your Snowflake data so that it can only be queried between 8-5 pm CST Monday-Friday because those are your business hours, you could automate this through ALTR.

By setting data limits and rate limits, credentialed users who query data outside of the data thresholds you’ve configured will cause an anomaly to be triggered. The anomaly alert will give you a heads-up so that you can investigate and take appropriate action.

Why are Rate Limits Important?

Setting rate limits is a must to control how much data a credentialed user can access. Just because they are approved to access some amount of data, doesn’t mean they should be able to see all of it. Let’s think of a credentialed user as a ‘house guest’ (metaphorically speaking). If you invite someone to stay for a few nights in your home during the week and everyone in your family turns in for the night by bed 11 pm, then does that mean you should give your houseguest free rein to roam through every room at 2 am after the household shuts down? And to circle back to data security, if credentials fall into the wrong hands or a user becomes disgruntled, you want to ensure that they cannot exfiltrate all the data but instead only a limited amount.

What to Consider When Establishing Rate Limits

Keep the following things in mind to help you think through the best approach for setting data thresholds as an extra layer of protection.

  • Gain a clear understanding of which columns contain sensitive data by using Data Classification (for context, see the Snowflake Data Classification: DIY vs ALTR Blog).
  • Gain a clear understanding of the amount and type of sensitive data different roles consume by using ALTR’s data usage heatmap

This insight should help your data governance team establish data rate limits that (if exceeded), should generate a notification or block their access.

How Snowflake Rate Limiting Works if You DIY

It doesn’t really. Here's why: data is retrieved from databases like Snowflake using the SQL language.

rate limiting

When you issue a query, the database interprets the query and then returns all the data requested at one time. This is the way SQL is designed.

rate limiting

Snowflake has role-based access controls built in but these controls are still designed to provide all of the requested data at once so a Snowflake user gets either all of the requested data or none of it. There's no in between. The concept of automatically stopping the results of a query midstream simply does not exist natively. This limitation applies to most if not all sequel databases. It's not something unique to Snowflake.

rate limiting

How Snowflake Rate Limiting Works in ALTR  

You can automate rate limiting by using ALTR in four simple steps. Because data thresholds extend the capabilities of Column Access Policies (i.e., Lock), you must create a Column Access Policy first before you can begin using data thresholds.

1. If you haven’t already done so, connect your Snowflake database to ALTR from the Data Sources page and check the Tag Data by Classification option.

This process will scan your data and tag columns that contain sensitive data with the type of data they contain.

rate limiting
Figure1. Data Sources page to connect your Snowflake database from

2. Choose and connect the columns you want to monitor from the Data Management page.

You can add columns by clicking the Connect Column button and choosing the desired column in the drop-down menus. See figure 2.

rate limiting
Figure 2. Connect Column button to add columns you want to monitor

3. Next, add a lock to group together sensitive columns that you want to put data limits on from the ‘Locks’ page.

In this example we are creating this lock named “Threshold Lock” to enforce a policy that limits access to the ID column for Snowflake Account Admins and System Administrators to no more than 15 records per minute. See figure 3

rate limiting
Figure 3. Lock (policy) that was named‘ Threshold Lock’ to limit access to the ID column

4. Create a data threshold that enforces the desired data limit policy.

Here we are creating a threshold that applies to ACCOUNT admins and SYSADMINS that limits access to the ID column in the customer table to no more than 10 records per minute. See figure 4.

You can specify the action that should occur when a data threshold is triggered.

  • Generate Anomaly: This generates a non-blocking notification in ALTR.
  • Block: This blocks all access to columns connected to ALTR to the user who triggered the threshold, replacing them with NULL.

You can also set Rules: These define what triggers the data threshold.    

  • Time-based rules: These will trigger a threshold when the indicated data is queried at a particular time.
  • Access-based rules: These will trigger when a user queries a particular amount of data in a short time
rate limiting
Figure 4. Account Admin user group that the threshold applies to

Snowflake Limit Use Cases

The Snowflake limit use cases below are examples of realistic scenarios to reiterate why this extra layer of security is a must for your business. Rate limiting can minimize data breaches and prevent hefty fines and lawsuits that will affect your company’s bottom line and reputation.

Use Case 1.

Your accounting firm has certain users within a role or multiple roles who only have a legitimate reason to access sensitive data such as personal mobile phone numbers or email addresses a certain number of times during a specific time period. For example, maybe they should only need to query 1000 records per minute/ per hour/ per day.

If a user is querying that data outside of the threshold, then it will generate an anomaly and, depending on how you’ve configured the threshold, also block their access until it’s resolved.

Use Case 2.

The business hours for your bank are Monday through Friday from 8-5 pm and Saturday from 9-1pm ET. There are certain users within a role or multiple roles that you’ve identified who have a legitimate reason to access sensitive data such as your customer’s social security numbers only within this timeframe.  

Rate Limit Violations

  • If the Threshold is only configured to generate an Anomaly, then the user who triggered the data threshold will be able to continue querying data in Snowflake.
  • If the Threshold is configured to block access, then the user who triggered the data threshold will no longer be able to query sensitive data in Snowflake. Any query they run on columns that are connected to ALTR will result in NULL values. This behavior will continue until an ALTR Admin resolves the anomaly in the ‘Anomalies’ page.
  • In addition, when there is an anomaly or block, ALTR can publish alerts you can receive through your Security Information and Event Management (SIEM) or Security Orchestration, Automation and Response (SOAR) tool for near-real-time notifications.

Automate Rate Limiting with ALTR

In today’s world where you must protect your company’s sensitive data from being hacked by people on the outside and leaked from staffers working on the inside, a well-thought-out data governance strategy is mandatory. By having to constantly remain vigilant, trying to safeguard your data might almost seem like a challenging chess match. However, ALTR can make this ‘game of strategy’ easier to win by automating rate limits for you.

Do you really have the time to write SQL code for each data threshold you want to set as your business scales? By using ALTR, it’s a few simple point-and-click steps and you’re done!

See how you can set rate limits in ALTR:

Get the latest from ALTR
Subscribe below to stay up to date with our team, upcoming events, new feature releases, and more.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.