“To rate limit with data usage thresholds or not to rate limit with data usage thresholds? That is the question.”
Even though this twist on the infamous line “To be, or not to be…” in William Shakespeare’s Hamlet is playful, protecting sensitive and regulated data from credentialed threats is very serious. We might all trust our data users, but even the most reliable employee’s credentials can be lost or stolen. The best approach is to assume that all credentials are compromised all the time.
So the question is not if you should put a limit on the amount of sensitive data that credentialed users can access, but how.
In this blog, I’ll explain what rate limiting is, how you can apply rate limiting in Snowflake, and how you’ll save time by automating data thresholds through ALTR. To reiterate how you will benefit from using ALTR to limit access and risk to data from credentialed access threats, this blog will also include a couple of use cases and a demonstration video.
What is Rate Limiting?
In a nutshell, rate limiting enables you to set a data threshold regarding specific user groups (roles) who can obtain sensitive column-level data based on an ‘access-based’ amount. For example, you might want to limit your company’s Comptroller to only query 1000 values per hour.
Another type of data threshold you could set is one that’s ‘time-based’. For example, if you want to limit access to your Snowflake data so that it can only be queried between 8-5 pm CST Monday-Friday because those are your business hours, you could automate this through ALTR.
By setting data limits and rate limits, credentialed users who query data outside of the data thresholds you’ve configured will cause an anomaly to be triggered. The anomaly alert will give you a heads-up so that you can investigate and take appropriate action.
Why are Rate Limits Important?
Setting rate limits is a must to control how much data a credentialed user can access. Just because they are approved to access some amount of data, doesn’t mean they should be able to see all of it. Let’s think of a credentialed user as a ‘house guest’ (metaphorically speaking). If you invite someone to stay for a few nights in your home during the week and everyone in your family turns in for the night by bed 11 pm, then does that mean you should give your houseguest free rein to roam through every room at 2 am after the household shuts down? And to circle back to data security, if credentials fall into the wrong hands or a user becomes disgruntled, you want to ensure that they cannot exfiltrate all the data but instead only a limited amount.
What to Consider When Establishing Rate Limits
Keep the following things in mind to help you think through the best approach for setting data thresholds as an extra layer of protection.
- Gain a clear understanding of which columns contain sensitive data by using Data Classification (for context, see the Snowflake Data Classification: DIY vs ALTR Blog).
- Gain a clear understanding of the amount and type of sensitive data different roles consume by using ALTR’s data usage heatmap
This insight should help your data governance team establish data rate limits that (if exceeded), should generate a notification or block their access.
How Snowflake Rate Limiting Works if You DIY
It doesn’t really. Here’s why: data is retrieved from databases like Snowflake using the SQL language.
When you issue a query, the database interprets the query and then returns all the data requested at one time. This is the way SQL is designed.
Snowflake has role-based access controls built in but these controls are still designed to provide all of the requested data at once so a Snowflake user gets either all of the requested data or none of it. There’s no in between. The concept of automatically stopping the results of a query midstream simply does not exist natively. This limitation applies to most if not all sequel databases. It’s not something unique to Snowflake.
How Snowflake Rate Limiting Works in ALTR
You can automate rate limiting by using ALTR in four simple steps. Because data thresholds extend the capabilities of Column Access Policies (i.e., Lock), you must create a Column Access Policy first before you can begin using data thresholds.
1. If you haven’t already done so, connect your Snowflake database to ALTR from the Data Sources page and check the Tag Data by Classification option.
This process will scan your data and tag columns that contain sensitive data with the type of data they contain.
2. Choose and connect the columns you want to monitor from the Data Management page.
You can add columns by clicking the Connect Column button and choosing the desired column in the drop-down menus. See figure 2.
3. Next, add a lock to group together sensitive columns that you want to put data limits on from the ‘Locks’ page.
In this example we are creating this lock named “Threshold Lock” to enforce a policy that limits access to the ID column for Snowflake Account Admins and System Administrators to no more than 15 records per minute. See figure 3
4. Create a data threshold that enforces the desired data limit policy.
Here we are creating a threshold that applies to ACCOUNT admins and SYSADMINS that limits access to the ID column in the customer table to no more than 10 records per minute. See figure 4.
You can specify the action that should occur when a data threshold is triggered.
- Generate Anomaly: This generates a non-blocking notification in ALTR.
- Block: This blocks all access to columns connected to ALTR to the user who triggered the threshold, replacing them with NULL.
You can also set Rules: These define what triggers the data threshold.
- Time-based rules: These will trigger a threshold when the indicated data is queried at a particular time.
- Access-based rules: These will trigger when a user queries a particular amount of data in a short time
Snowflake Limit Use Cases
The Snowflake limit use cases below are examples of realistic scenarios to reiterate why this extra layer of security is a must for your business. Rate limiting can minimize data breaches and prevent hefty fines and lawsuits that will affect your company’s bottom line and reputation.
Use Case 1.
Your accounting firm has certain users within a role or multiple roles who only have a legitimate reason to access sensitive data such as personal mobile phone numbers or email addresses a certain number of times during a specific time period. For example, maybe they should only need to query 1000 records per minute/ per hour/ per day.
If a user is querying that data outside of the threshold, then it will generate an anomaly and, depending on how you’ve configured the threshold, also block their access until it’s resolved.
Use Case 2.
The business hours for your bank are Monday through Friday from 8-5 pm and Saturday from 9-1pm ET. There are certain users within a role or multiple roles that you’ve identified who have a legitimate reason to access sensitive data such as your customer’s social security numbers only within this timeframe.
Rate Limit Violations
- If the Threshold is only configured to generate an Anomaly, then the user who triggered the data threshold will be able to continue querying data in Snowflake.
- If the Threshold is configured to block access, then the user who triggered the data threshold will no longer be able to query sensitive data in Snowflake. Any query they run on columns that are connected to ALTR will result in NULL values. This behavior will continue until an ALTR Admin resolves the anomaly in the ‘Anomalies’ page.
- In addition, when there is an anomaly or block, ALTR can publish alerts you can receive through your Security Information and Event Management (SIEM) or Security Orchestration, Automation and Response (SOAR) tool for near-real-time notifications.
Automate Rate Limiting with ALTR
In today’s world where you must protect your company’s sensitive data from being hacked by people on the outside and leaked from staffers working on the inside, a well-thought-out data governance strategy is mandatory. By having to constantly remain vigilant, trying to safeguard your data might almost seem like a challenging chess match. However, ALTR can make this ‘game of strategy’ easier to win by automating rate limits for you.
Do you really have the time to write SQL code for each data threshold you want to set as your business scales? By using ALTR, it’s a few simple point-and-click steps and you’re done!