ALTR Blog

The latest trends and best practices related to data governance, protection, and privacy.
BLOG SPOTLIGHT

Format-Preserving Encryption: A Deep Dive into FF3-1 Encryption Algorithm

ALTR’s Format-Preserving Encryption, powered by FF3-1 algorithm and ALTR’s trusted policies, offers a comprehensive solution for securing sensitive data.
Format-Preserving Encryption: A Deep Dive into FF3-1 Encryption Algorithm

Browse All

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Many of us have become more aware of the power of increased knowledge around our activities over the last few years – whether it’s a FitBit monitoring our steps or an energy audit delivering a detailed view of how everyone in your home uses lights, appliances, electronics, and other things that need power. Each month your utility company monitors your usage, and the details can help you recognize ways to lower your bill and identify current problems that are making your home less energy efficient. You can do the same with data usage observability. By capturing and monitoring who is running queries on data and when, you can make informed decisions to prevent data breaches and leaks. ALTR Heat Maps and Analytics Dashboards can make this information easily viewable and digestible so you can see where the issues might be. 

This blog provides a high-level explanation of what data observability is and why it’s important, how it works if you do it manually in Snowflake, and how it works if you use ALTR to automate the process.  We’ve also included a few use cases and a how-to demonstration video. We hope you find it helpful! 

What is Data Observability and Why is It Important?  

At a high level, data observability is presenting information about how users are accessing data in an easy-to-consume visual format. Operationalizing this through data observability tools is critical to helping you understand what’s occurring and to gauging abnormal events. The payoff for using ALTR's data usage observability tools is that they provide the information needed to meet two key data security policy goals:

  • Ensure that you have policies in place for all roles who access sensitive data 
  • Help you understand what normal access looks like because you can't identify what is abnormal without a baseline to compare to 

Snowflake logs capture this access information, and the events shown can help your Data Security team spot issues and minimize time spent on bottlenecks, speed, or other problems. But this information is delivered in a plain text format that requires a lot of work to extract those insights from.  

How Snowflake Data Observability Works if you DIY 

Snowflake provides the foundational query history data needed for data usage analytics via Snowflake logs; however, to be useful, the data must be processed to get it in a visual form that is easy to interpret. 

To do data observability manually in Snowflake, you must follow the steps below.

  1. Parse the SQL query text to extract a list of columns that you’d like to request and then filter it to only include columns that contain sensitive data.  NOTE: This will require you to write SQL statements.
  2. Next, tabulate the count of records that each user has accessed each column for each minute, hour, and day of the past 24 hours. NOTE: This will require even more SQL coding.
  3. Last, convert the data set into an interactive visual chart that will display the information in a more understandable format to view the results and drill through them. NOTE: This will require full stack development skills to implement.
data observability
Figure 1.  SnowflakeQuery History on the left and Query Strings in SQL that’s unstructured and must be parsed

As you can see from the steps that are required, data observability done manually will require more time and lots of coding. 

How Snowflake Data Observability Works Using ALTR to Automate  

ALTR's Dashboard provides a high-level view of everything that’s happening in ALTR. For example, it will show you how many locks and thresholds you have, open anomalies, databases that are connected, columns that are governed, and other detailed data.

data observability
Figure 2. ALTR Dashboard

The built-in ALTR Heatmap (also what we refer to as ‘Data Analytics’ or ‘Query Analytics’) delivers data observability in a visual representation of how your users are accessing data. It shows you the roles and specific users in those roles who are querying different data sets. The analytics will give insight about how your data is being used to help you identify where you need to assign policy or if you’ve already assigned policy to confirm how people are querying data within those policies. 

data observability
Figure 3. ALTR Analytics that has a drill-down option to see a breakdown of analytics per user

When you hover over the heatmap (as shown in figure 4) you can see the total number of values accessed by your assigned user groups in the columns you’re governing. You also can drill down and view a more granular level of what data is accessed by your specific users, user groups, and data types.

data observability
Figure 4. Data Usage Heatmap example that shows the number of values queried by each user

If you check Add Data Usage Analytics when connecting your data source, ALTR will import the past 30 days of Snowflake's history to show you your organization's usage trends. From there, your query history will automatically sync daily on all columns in your connected database. See figure 3 for context. 

Snowflake Data Observability Use Cases

Here are a couple of use case examples where ALTR’s automated data observability capability can benefit your business as it scales with Snowflake usage. 

Use Case 1. You want to determine a typical consumption pattern and restrict access to no more data than is normal for the user’s role.

You’re an Administrator who has already created policies on your different column-level data but want to determine if you should create an additional one to your Credit Card column. You could view the data usage over the last 7 or 30 days to see what a typical consumption pattern is and then decide what to set your time or access-based threshold to.

Use Case 2. You want to determine if anything looks strange that may require action and ensure all roles accessing data have policies that cover them.

You’re a Data Scientist and want to confirm that the right user groups are accessing column-level data that you’ve created policies for. If anything looks strange (for example, certain roles are querying data on the weekends instead of your business hours) then you can determine if you need to block access or trigger an alert (anomaly) to protect your data. 

Automate Snowflake Data Observability with ALTR

By operationalizing metrics through ALTR’s data observability tools , you can minimize data breaches and make informed decisions to stay in compliance with SLAs and certain regulations. The detailed data that the ALTR Dashboard and Analytics (Heatmap) provides is a must-have for an effective data security strategy. We’ve made using ALTR to view everything that’s going on within it and your analytics so convenient that you don’t have to write any SQL code.  It’s a simple point-and-click in ALTR and you’re done!

See how you would get data observability by doing it yourself in Snowflake vs automatically in ALTR: 

In a study performed ahead of last week’s Gartner Data and Analytics Conference, researchers found that data governance is a top initiative that business leaders and data officers plan to focus on in 2023 and into 2024. We’re glad to see companies choosing this as a top priority. Data governance is only increasing in urgency and demand, yet we’ve seen many organizations falling behind in establishing a proper data governance practice.  

Data Governance Definition

At its center, data governance is, “the process of overseeing the integrity, security, usability, and availability of data within an enterprise so that business goals and regulatory requirements can be met.”

According to the Gartner Glossary, data governance is, “the specification of decision rights and accountability framework to ensure the appropriate behavior in the valuation, creation, consumption, and control of data and analytics.” A well-run data governance practice can improve your data quality by setting the standard for how data is received, stored, and processed within your database.

A successful data governance strategy will guarantee that your data is dependable and that data users are being held accountable for their access levels. Data governance is a critical business component for all industries and will only continue to increase in traction as a standard business practice as more rules and regulations are set in motion surrounding data governance.  

Building a Successful Data Governance Strategy

Building a data governance strategy ahead of implementing a data governance solution is the best way to ensure your data plan is set up in such a way that will offer success for the data you store. Your data governance strategy serves as the base for the decisions that your organization makes. Once you have that strategy in place you can better identify which data governance solution best fits your needs. A solid data governance strategy will guarantee that you are building a solid foundation for your data.

Our partner, Alation, says that a successful data governance strategy must decide four things: data availability, data consistency, data accuracy, and data security.

Data Availability: A good data governance strategy allows the correct data to be available to the correct people at the appropriate times. Your data governance solution should be structured in such a way that the people who the data is made available to can easily find and access the data. When strategizing, your team should determine what data availability you will need within a data governance solution.

Data Consistency: One key point your organization should consider when discussing your data governance strategy is the standardization of data points across your database. Determining these key data points from the beginning will help streamline the decision-making process down the road and will ensure consistent decisions are being made across your organization regarding your data.

Data Accuracy: Determining how data comes into your database, what you do with it once it’s there, and how data will exit your database will establish ground rules for the future of how your data is managed. It’s important to determine ahead of time the values, tags, and lifecycle that will be associated with data points to ensure consistency and accuracy, and guarantee that your dataset is error-free.

Data Security: Companies are responsible for protecting the sensitive data entrusted to them. Should your organization need to pass regulatory audits for any reason, a good data governance strategy and solution will ensure your data is safe, and that you have the audit logs available confirm regulatory compliance.

6 Must-Haves to Master Data Governance

A quick Google search of “Data Governance Solutions” will prove there isn’t a one size fits all approach to data governance tools. By taking the time to pre-determine your data governance strategy, you are better positioned to find a data governance solution that is flexible and scalable to fit your needs.

ALTR’s data governance solution delivers both scalability and flexibility, and we’ve seen data governance success in organizations from a multinational retailer governing PII to a regional credit union protecting PCI to a unique healthcare organization safeguarding PHI.

Our data governance features matrix outlines 18 key points that we think are critical when evaluating whether a data governance solution will suit your needs. We’ve broken down key differences that ALTR brings to the table in each of these categories:

1) Flexible Pricing

Starting price and scalable pricing are points to consider when choosing a data governance solution. While some data governance solutions charge six figures to start, ALTR is proud to offer a free solution for one database within Snowflake and scalable pricing when you decide to upgrade to ourEnterprise or Enterprise Plus plans.

2) Access Monitoring

See what data is used, by whom, when, and how frequently with ALTR’s industry-first interactive Data Usage Heatmaps and drill-down Analytics Dashboards. Access monitoring is helpful for understanding normal data usage, identifying abnormalities, and preparing your organization for an audit request.

3) Data Masking

Within ALTR, you can quickly and seamlessly apply flexible, dynamic data masking over PII like social security numbers oremail addresses to keep sensitive data private.

4) Rate Limiting

ALTR’s patented, rate-limiting data access threshold technology can send real-time alerts, and slow or stop data access on out-of-normal requests. The control is then in your hands to stop individual access when it exceeds normal limits. This is helpful for mitigating the risks associated with credentialed access and data breaches.

5) Tokenization

Tokenization is the process of replacing actual sensitive data elements with random and non-sensitive data elements (tokens) that have no exploitable value, allowing for a heightened level of data security during the data migration process.  ALTR’s tokenization allows you access to secure-yet-operable data protection with our patented data tokenization-as-a-service. 

6) Open-Source Ecosystem Connectors

As we worked to integrate our solution with data discovery, catalog, and ETL tools, we found opportunities for very simple data governance integrations to be created. We found open source to be an ideal distribution method as it allows companies more flexibility in building their integrated and secure data stacks.  

What’s Next for Data Governance

The need for data governance is guaranteed to only increase in the coming years. IDC predicts that the global datasphere will double in size from 2022 to 2026. We predict that in the near future, companies that don’t have a data governance strategy in place now, will soon need to.

From legacy on-premises providers to new cloud-based start-ups to lateral players in the data ecosystem, it seems almost everyone offers a “data governance” solution these days. But, there’s no actual data governance without data control and protection, and DIY and manual approaches can only take you so far. To ensure they don’t fall behind, companies should be evaluating data governance solutions now to find those that meet the requirements of their data governance strategy and deliver those six “must-haves” for mastering data governance.

Sometimes it seems like data governance and security is everyone’s and no one’s job. When that’s the case, there can be cracks in your data governance and security posture, and that can open the door to data risk. One way to overcome this is to ensure that the entire company is supportive of the initiative. But how?  

As part of our Expert Panel Series, we asked some experts in the modern data ecosystem how data governance and security leaders can better engage with the rest of the organization.

Here’s what we heard….

“Data governance is not a joyful exercise."

"If there is data governance needed, there is a pain, threat or a business opportunity..."

"Implementing data governance requires changes in processes, roles and responsibilities ... and by definitions humans are reluctant to change..."

"So my key advice to data governance and security leaders is to emphasize the WHY data governance is a must, why it is needed by the organization and make sure you have a strong change management plan in place, not just tools to roll out!”
- Damien Van Steenberge, Managing Partner, Codex Consulting
“I think the best way for data governance leaders to engage the rest of the organization is to show them a world where it's easy for data consumers to access data and then show them the power and value of the data they're able to access. Show them how it makes their job easier, better, faster.”
- Chris Struttmann, Founder and CTO, ALTR
“Data governance and security initiatives can fall flat if they are perceived as top-down mandates that are out of touch with the work that’s being done. The best way to get buy-in from across the organization is to tie data governance and security initiatives with use case-based deployment. When stakeholders can see the positive impact and relevance of new technology or practices, they won’t just be more engaged – they will become champions.”
- Pat Dionne, Founder and CEO, Passerelle
“From an ETL point of view, it’s ensuring that data engineers are given the freedom to automate their pipelines which ensure they are adhering to governance policies. Too much process in this area can slow them down. They want to just run and know that it’s all going to fall into place.”
- John Bagnall, Sr Product Manager, Matillion

Watch out for the next installment of our Expert Panel Series on LinkedIn!  

With our recent open-source data governance integration initiative announcement, I wanted to take this opportunity to explain in a little more detail why ALTR decided to go down the path of open-source data governance integrations. One of our principles has always been that because data governance is so critical, it has to be easy and accessible across the data ecosystem. In fact, strong governance and security are a requirement for adding many workloads to the cloud...which has led to the necessity of many of the products in our ecosystem. Other vendors in the space, though, perhaps coming from a more traditional enterprise software mindset, are focusing on their own proprietary marketplaces for connectors or charging for custom integrations to connect various data tools. While this approach may work well for short-term bookings, it doesn’t serve the long-term customer mission to get the most value out of their data.  

Furthermore, that approach just doesn’t align with ALTR’s DNA. We built our SaaS platform to be accessible via the cloud, we removed the need for SQL authoring with our point and click interface, we built an incredibly powerful automation interface on top of that, and we introduced the first and only completely free data access control solution in the market. Doing the status quo just because it is the status quo is antithetical to the founding mission of ALTR.  

As we worked to integrate our solution with data discovery, catalog, and ETL tools, we found opportunities for very simple open-source data governance integrations to be created. Once we built a few of them, we started to identify that open source was an ideal distribution method.  

How open-source integrations help our customers:

  • Unbind the buying process: Free, open-source data governance connectors allow customers the flexibility to choose the solutions they want to implement on their own schedule and timeline, rather than having to research, select, and onboard the full stack at once as part of a consolidated purchase process. Customers can move at their own pace, choose the tools they prioritize for their budget, and add data access control and security when ready.  
  • Flexible implementation: Open-source integrations enable customers to implement their unique use case and configure the solutions in a way that works for their infrastructure, without custom code or manual implementation, rather than being bound by the limitations of a fixed integration delivered by a partner marketplace. This also allows users with resource constraints to do more with fewer solutions, optimizing their stacks for increased efficiency.  
  • Enterprise-level features for free: ALTR’s enterprise-ready integrations work with data catalogs, ETLs (Extract, Transform, Load), and other data ecosystem tools data customers already use. This increases the data access control features available to customers while decreasing the number of tools they’re required to manage.  
  • Community development and improvement: We’ve found over the years that almost all our customers look to solve the same problems repeatedly, so like any open-source initiative, we’re enabling end users to contribute their own solutions to the ALTR GitHub library. For example, if a user wanted to send ALTR data classification information into a specific field in a data catalog, they could build that feature and submit that back to the repository for others to benefit from. Because all the customers are solving the same problems, we’ve created an environment where peers across organizations can gain from the experience of others, which makes everyone’s job easier.  

ALTR’s open-source data governance integrations are available through our GitHub open source library.  

open source data governance

With this initiative, ALTR offers non-proprietary connectors to extend the powerful features we provide in the ALTR Free forever plan into leading partner stacks, including Alation and Matillion. These open-source integrations enable seamless data governance, with access control and security spanning from database to data catalog to ELT to cloud data platforms. Complexity is removed by merely plumbing together the already in-market solutions in our ecosystem. Nothing proprietary or complex—just simple and thoughtful connectors which bring ALTR’s value and feature set into the adjacent tools of our ecosystem.

Our end goal is to facilitate interoperability and remove barriers so that customers can build an integrated cloud data stack that allows data to flow freely and securely, and ultimately allows the customer to get more value from more data more quickly and with less resources

See how our open-source data governance integration works with Alation:  

See how our open-source data governance integration works with Matillion:  

Hear Chris and James explain why open source data governance integrations is the best approach:  

Try it yourself now with the ALTR Free plan

Determining whether a data lake or a data warehouse is the best fit for your organization’s data is likely one of the first in a long line of data-driven decisions you’ll make in your data governance journey. We’ve outlined four key differences between data lakes and data warehouses and explained factors that may impact your decision.

By definition, a data lake is a place where data can be stored in a raw and unstructured format. This data is accessible whenever and by whomever - by data scientists or line of business execs. On the other hand, a data warehouse stores structured data that has been organized and processed and allows the user to view data in digestible formats based on predefined data goals. Due to their nature, there are a few key differentiators between these two data storage options.

1) Data Format

First, the format in which data can be viewed after import varies between data lakes and data warehouses.

A data warehouse requires data to be processed and formatted upon import, which requires more work on the front end, but allows for more organized and digestible data to be viewed at any point in the data’s lifecycle after defining the schema. Data typically flows into data warehouses from multiple sources, and typically on a regular and consistent cadence. Once the data is collected in the warehouse, it is sorted based on pre-determined schemas that your data team sets.

Data lakes allows you to store data in its native or raw format the entire time the data is housed within the lake. This allows for a quick and scalable import process and allows for your organization to store a lot of data in one place and access the raw form at any point. Data lakes typically are optimized to store massive amounts of data from multiple sources, allowing your data to be unstructured, semi-structured, or structured.

2) Processing

The way in which data is processed is a critical differentiator between a data lake and a data warehouse.

Data warehouses use a process called schema-on-write and data lakes use a process called schema-on-read. A schema within data governance is a collection of objects within the database, such as, tables, views, and indexes.

Schema-on-write, what is used in data warehouses, allows the data scientist to develop the schema when writing, or importing, the data, so that the database objects, including tables and indexes can be viewed in a concise way once imported. This may mean more work on the front end writing SQL code and determining the objectives of your data warehouse, but will allow for a more digestible view of your data once imported.

On the other hand, schema-on-read allows execs to forego developing the schema when importing the data into the data lake but will require you to develop the schema when accessing the data later down the road. Schema-on-read is what allows your data to be stored in unstructured, semi-structured, or structured formats within your data lake.

3) Flexibility

The benefit of schema-on-read is allowing the schema to be created on a case-by-case basis to benefit the data set. Many who opt to store their data in a data lake prefer the flexibility that schema-on-read allows for each unique data set.

Alternatively, schema-on-write interprets all imported data equally and does not allow for variance once imported. The benefit of flexibility in a data warehouses is the ability to immediately see the impact of your data within the warehouse after import – you’ve already done the front end work of determining the schema and your data will be immediately accessible and readable for you.

4) Users

Finally, accessibility and user control may be the deciding factor for how and where your company stores data.

A data lake is more accessible by day-to-day business execs and makes it easy to add new raw data to your lake. A data lake is traditionally less expensive due to the nature of the format, and because you likely won’t need additional manpower to import and maintain your data within the lake. The nature of a data lake is such that data can regularly be added in its original format and the end outcome of the data can be determined down the road, at any point in the data’s lifecycle.

A data warehouse likely will only be accessible and able to be updated by data engineers within your organization. It is more complicated to update and may be more costly because of the manpower required to produce changes. When setting up your data warehouse, your data team will likely need context of what your data needs to do in order to correctly write the SQL code that will make your warehouse successful.

It's important to note that you can have a data warehouse without a data lake, but a data lake is not a direct replacement for a data warehouse and is often used to complement a data warehouse. Many companies who use a data lake will also have a data warehouse.

Regardless of where you store your data, you’ll need to set up access rules to govern and protect it. Implementing a cloud data security solution has never been easier.

“To rate limit with data usage thresholds or not to rate limit with data usage thresholds? That is the question.”

Even though this twist on the infamous line “To be, or not to be…” in William Shakespeare’s Hamlet is playful, protecting sensitive and regulated data from credentialed threats is very serious. We might all trust our data users, but even the most reliable employee’s credentials can be lost or stolen. The best approach is to assume that all credentials are compromised all the time.

So the question is not if you should put a limit on the amount of sensitive data that credentialed users can access, but how.

In this blog, I’ll explain what rate limiting is, how you can apply rate limiting in Snowflake, and how you’ll save time by automating data thresholds through ALTR. To reiterate how you will benefit from using ALTR to limit access and risk to data from credentialed access threats, this blog will also include a couple of use cases and a demonstration video.

What is Rate Limiting?

In a nutshell, rate limiting enables you to set a data threshold regarding specific user groups (roles) who can obtain sensitive column-level data based on an ‘access-based’ amount. For example, you might want to limit your company’s Comptroller to only query 1000 values per hour.

Another type of data threshold you could set is one that’s ‘time-based’. For example, if you want to limit access to your Snowflake data so that it can only be queried between 8-5 pm CST Monday-Friday because those are your business hours, you could automate this through ALTR.

By setting data limits and rate limits, credentialed users who query data outside of the data thresholds you’ve configured will cause an anomaly to be triggered. The anomaly alert will give you a heads-up so that you can investigate and take appropriate action.

Why are Rate Limits Important?

Setting rate limits is a must to control how much data a credentialed user can access. Just because they are approved to access some amount of data, doesn’t mean they should be able to see all of it. Let’s think of a credentialed user as a ‘house guest’ (metaphorically speaking). If you invite someone to stay for a few nights in your home during the week and everyone in your family turns in for the night by bed 11 pm, then does that mean you should give your houseguest free rein to roam through every room at 2 am after the household shuts down? And to circle back to data security, if credentials fall into the wrong hands or a user becomes disgruntled, you want to ensure that they cannot exfiltrate all the data but instead only a limited amount.

What to Consider When Establishing Rate Limits

Keep the following things in mind to help you think through the best approach for setting data thresholds as an extra layer of protection.

  • Gain a clear understanding of which columns contain sensitive data by using Data Classification (for context, see the Snowflake Data Classification: DIY vs ALTR Blog).
  • Gain a clear understanding of the amount and type of sensitive data different roles consume by using ALTR’s data usage heatmap

This insight should help your data governance team establish data rate limits that (if exceeded), should generate a notification or block their access.

How Snowflake Rate Limiting Works if You DIY

It doesn’t really. Here's why: data is retrieved from databases like Snowflake using the SQL language.

rate limiting

When you issue a query, the database interprets the query and then returns all the data requested at one time. This is the way SQL is designed.

rate limiting

Snowflake has role-based access controls built in but these controls are still designed to provide all of the requested data at once so a Snowflake user gets either all of the requested data or none of it. There's no in between. The concept of automatically stopping the results of a query midstream simply does not exist natively. This limitation applies to most if not all sequel databases. It's not something unique to Snowflake.

rate limiting

How Snowflake Rate Limiting Works in ALTR  

You can automate rate limiting by using ALTR in four simple steps. Because data thresholds extend the capabilities of Column Access Policies (i.e., Lock), you must create a Column Access Policy first before you can begin using data thresholds.

1. If you haven’t already done so, connect your Snowflake database to ALTR from the Data Sources page and check the Tag Data by Classification option.

This process will scan your data and tag columns that contain sensitive data with the type of data they contain.

rate limiting
Figure1. Data Sources page to connect your Snowflake database from

2. Choose and connect the columns you want to monitor from the Data Management page.

You can add columns by clicking the Connect Column button and choosing the desired column in the drop-down menus. See figure 2.

rate limiting
Figure 2. Connect Column button to add columns you want to monitor

3. Next, add a lock to group together sensitive columns that you want to put data limits on from the ‘Locks’ page.

In this example we are creating this lock named “Threshold Lock” to enforce a policy that limits access to the ID column for Snowflake Account Admins and System Administrators to no more than 15 records per minute. See figure 3

rate limiting
Figure 3. Lock (policy) that was named‘ Threshold Lock’ to limit access to the ID column

4. Create a data threshold that enforces the desired data limit policy.

Here we are creating a threshold that applies to ACCOUNT admins and SYSADMINS that limits access to the ID column in the customer table to no more than 10 records per minute. See figure 4.

You can specify the action that should occur when a data threshold is triggered.

  • Generate Anomaly: This generates a non-blocking notification in ALTR.
  • Block: This blocks all access to columns connected to ALTR to the user who triggered the threshold, replacing them with NULL.

You can also set Rules: These define what triggers the data threshold.    

  • Time-based rules: These will trigger a threshold when the indicated data is queried at a particular time.
  • Access-based rules: These will trigger when a user queries a particular amount of data in a short time
rate limiting
Figure 4. Account Admin user group that the threshold applies to

Snowflake Limit Use Cases

The Snowflake limit use cases below are examples of realistic scenarios to reiterate why this extra layer of security is a must for your business. Rate limiting can minimize data breaches and prevent hefty fines and lawsuits that will affect your company’s bottom line and reputation.

Use Case 1.

Your accounting firm has certain users within a role or multiple roles who only have a legitimate reason to access sensitive data such as personal mobile phone numbers or email addresses a certain number of times during a specific time period. For example, maybe they should only need to query 1000 records per minute/ per hour/ per day.

If a user is querying that data outside of the threshold, then it will generate an anomaly and, depending on how you’ve configured the threshold, also block their access until it’s resolved.

Use Case 2.

The business hours for your bank are Monday through Friday from 8-5 pm and Saturday from 9-1pm ET. There are certain users within a role or multiple roles that you’ve identified who have a legitimate reason to access sensitive data such as your customer’s social security numbers only within this timeframe.  

Rate Limit Violations

  • If the Threshold is only configured to generate an Anomaly, then the user who triggered the data threshold will be able to continue querying data in Snowflake.
  • If the Threshold is configured to block access, then the user who triggered the data threshold will no longer be able to query sensitive data in Snowflake. Any query they run on columns that are connected to ALTR will result in NULL values. This behavior will continue until an ALTR Admin resolves the anomaly in the ‘Anomalies’ page.
  • In addition, when there is an anomaly or block, ALTR can publish alerts you can receive through your Security Information and Event Management (SIEM) or Security Orchestration, Automation and Response (SOAR) tool for near-real-time notifications.

Automate Rate Limiting with ALTR

In today’s world where you must protect your company’s sensitive data from being hacked by people on the outside and leaked from staffers working on the inside, a well-thought-out data governance strategy is mandatory. By having to constantly remain vigilant, trying to safeguard your data might almost seem like a challenging chess match. However, ALTR can make this ‘game of strategy’ easier to win by automating rate limits for you.

Do you really have the time to write SQL code for each data threshold you want to set as your business scales? By using ALTR, it’s a few simple point-and-click steps and you’re done!

See how you can set rate limits in ALTR:

Organizations don’t have the time or resources to waste on technologies that don’t work or are difficult to assemble. Creating a future-proof data stack allows organizations to avoid adoption fatigue, build a data-centric culture, keep data secure and make the most of technology investments. We interviewed Pat Dionne, CEO of Passerelle, to find out the prerequisites for a successful data modernization strategy and why data governance plays a critical role in your data ecosystem.

What are the biggest challenges customers face when building a modern data architecture?

It isn’t hard to build a modern data stack – there are a dizzying variety of tools, and each comes with a compelling value proposition. The biggest frustration customers have after they have made the investment and have started trying to get an ROI from their tools. While ingestion can be simple, assembling data into reusable and manageable assets is much more complex. Data modeling and data quality directly impact an organization’s ability to maximize value and agility and are critical for finding a return on the technology investment. Unfortunately, the latter is often forgotten in the decision process.

What components are vital to successful data modernization projects?

When it comes to data modernization, it is critical to have a collaborative approach to cataloging and securing data across an organization. Collaboration builds consensus on data classification terms and rules, creating a universal definition of data asset ownership and a clear understanding of what is required to access data. The more complicated the access scenarios, the more critical it is to have a transparent, cohesive implementation strategy. Similarly, it is essential to invest in tools that support collaboration. For example, we like the simplicity and elegance of ALTR’s solution enabling data governance and security teams.

What role do data governance and data security play in modern data architecture?

Data governance moves data security from a controlling function to an enabling function, while data security protects data from unauthorized access and use. Data governance cannot exist without robust data security; in turn, data security should not inhibit business agility and creativity. Managing the interplay between data governance and security requires understanding how data is used and by whom and requires the proper tooling to enable businesses while providing the appropriate level of control and observability. ALTR simplifies the process by offering clear access controls and immediate visibility into data security protocols.  

data modernization

How do you foster a culture of data governance?

For data governance programs to succeed, IT and business stakeholders need to see the value in implementation and adoption. Tying data governance programs to business use is the ultimate unifier - it requires bringing together data stewards, business-line decision-makers and data engineers to a collective understanding of their roles and responsibilities. We refer to this as “Data as a Team Sport.” We are firm believers in use-case-based development – it is easier to get people on board when you have proven results and vocal champions.  

What advice would you give to a company starting its data modernization journey?

Introducing practical data governance at the onset of data modernization is easier. Most of the time, organizations will introduce tools and proficiencies throughout a data modernization initiative - the proper data governance practices and tools will apply to every step of that modernization journey and scale with use. In building terms, it is easier to provide structural support with a sturdy foundation than to rely on scaffolding once the walls start to go up.  

How do you predict the data management landscape will change in the next 3-5 years?

I see three major trends in the next three to five years:

  1. First, we will see an increase in automation and intelligence in data management tooling, fueled by AI developments and human brilliance.
  1. Organizations will demand more integrated solutions to reduce technical debt and manage leaner technology stacks.  
  1. Not only will we see increased regulatory compliance requirements, but we will also enter an era of enforcement, where the government will become more aggressive at enforcing data privacy laws.

Pat Dionne, CEO, Passerelle

Pat Dionne, CEO of Passerelle

Passerelle offers solutions for business growth and results, and with that, a team of experienced technical contributors and managers and the innovative technologies to create the right solution for clients. Pat is at the heart of this synergy, bringing a deep understanding of the modern technologies capable of addressing today’s complex data business challenges, as well as the proven capacity to build and empower highly effective teams.

Many data governance solutions claim to solve every data privacy and protection issue, but we know that no two data governance solutions are created equal. As we launch into the New Year, we’ve listed our top 5 tips for Data Governance in 2023. These tips will help you determine what you need from your data governance solution, identify a few red flags to look out for, and point out some key differentiators that may help make your decision for you.

Tip 1: Keep tabs on your organization’s sensitive data.

The first step to ensuring your data governance solution is the right fit for you, is asking the question: “Where does sensitive data exist within my organization, and is it protected?” Understanding what sensitive data you store and who has access to it are critical first steps to ensuring the data governance solution you implement will fit your needs. While only certain data requires protection by law, leaked data can cause a headache across your organization – from damaging your reputation to the loss of loyal customers. It is essential that your data be discovered and classified across your organization’s ecosystem at all times.

Tips for Data Governance

Tip 2: Does your Data Governance solution offer complete coverage?

Data classifiers and catalogs are valuable and are extremely necessary in context, but at the end of the day, they cannot offer you a full governance solution. For complete data governance, you must not only be able to find and classify your data, but see data consumption, utilize thresholds to detect anomalies and alert on them, respond to threats with real-time blocking, and tokenize critical data at rest. True data governance will need to address a wide spectrum of access and security issues, including Access Controls, Compliance, Automation, Scale, and Protection. ALTR simplifies these steps for you – allowing you the ease of point and click solutions to better secure and simplify your data.

Tip 3: More expensive doesn’t mean better.

Many data governance solutions cost anywhere from $100k to$250k per year just to get started! These large, legacy platforms require you to invest valuable time, resources and money to even get started. You may need an army of costly consultants and six months to implement. On the other hand, ALTR’s pricing starts at free for life. Our Free Plan isn’t a trial plan, it’s just that – Free. Our Free plan gives you the power to understand how your data is used, add controls around access, and limit your data exposure. You can see how ALTR will work in your data ecosystem without risk. 

If you need more advanced governance controls, integration with your enterprise governance and security platforms, or increased data protection and dedicated support, our Enterprise and Enterprise Plus plans area vailable. ALTR’s tiered pricing means there’s no large up-front commitment—you can start for free and expand if or when your needs change. Or stay on our free plan forever.

Tip 4: The Who of Data Governance

Clearly defining roles within your organization surrounding who needs access to data and when will set you up for success when it comes to protecting sensitive data within your organization.

When you know why each person needs the data you are protecting, you can build access control policies to fit highly specific purposes. Using ALTR you can create policies that limit access based on which data is being requested, who is requesting it, the access rate, time of day, day of week, and IP address. ALTR’s cloud-based policy engine and management console allow you to control data consumption across multiple cloud and on-premises applications from one central location.

Tip 5: Does your data governance solution allow you to scale?

Scalability may be the one thing that makes or breaks your data governance solution in 2023. As regulations and laws surrounding data privacy become more common, the more the data you own will need to be protected. The more data you need protected, the more time your data team is needing to allocate to processes that could easily be automated within ALTR. Governance solutions should easily implement and manage access for thousands of users to match. Scaling policy thresholds as needed allows you to optimize collaboration while stopping data theft or accidental exposure.

Bonus Tip: Start for Free

We anticipate that 2023 will be a critical year for companies being held accountable for the sensitive data they own. ALTR makes getting ahead of the curve simple, easy, and achievable. With ALTR’s free data governance and security integration for Snowflake, you can automatically discover, classify, and tag sensitive data with a checkbox. Add controls like data masking from a drop-down menu. Get going in less than an hour. No SnowSQL is required.

What is PII Security?

PII security has become something just about everyone has had to think about in the last few years with the increase in personal data breaches and the passage of the GDPR regulations in Europe. But that doesn’t mean it’s well understood. What do we mean when we talk about PII data anyway? Personally Identifiable Information or PII data generally refers to information that is related to or key to identifying a person. There are broader terms such as “personal data” or “personal information,” but “PII” has become the standard acronym used to refer to private or sensitive information that can identify a specific individual. The US NIST framework defines Personally Identifiable Information as any representation of information that permits the identity of an individual to whom the information applies to be reasonably inferred by either direct or indirect means.

While the abbreviation “PII” is commonly used in the United States, the phrase it abbreviates is not always the same – there are common variants based on personal or personally, and identifiable or identifying. The meaning of the phrase "PII data" ends up varying depending on the jurisdiction and the purpose for which the term is being used. For example, where the General Data Protection Regulation (GDPR) is the primary law regulating PII data, the term "personal data" is significantly broader. Regardless of the definition used, the focus on PII security is also growing quickly.

PII security consists of ensuring that only approved users have access to this most sensitive of personal data. In some cases this is required by regulation, but in the US, without a federal regulation like GDPR, it's more often a requirement to maintain customer trust. In this blog, we'll outline PII data examples, the differences between PII, PHI and PCI and explain the steps you should take to identify PII and ensure it's secured.

PII Data Examples

The first step to PII security is understanding what is considered PII data. As mentioned above, it’s more complicated than it may first appear. Not all private information is PII and not all PII data is private information. In fact, much of the information considered PII data and covered by regulation is actually publicly available information, such as an individual’s name or phone number. However, some of the information, especially when combined and in the hands of bad actors, can lead to negative consequences for individuals. Here are some PII examples:

PII security
  1. Names: full name, maiden name, mother’s maiden name, or alias
  1. Individual identification numbers: social security number (SSN), patient identification number passport number, driver’s license number, taxpayer identification number, financial account number, or credit card number
  1. Personal address: street address, or email address
  1. Personal phone numbers
  1. Personal characteristics: photographic images (particularly of a face or other identifying physical characteristics), fingerprints, handwriting
  1. Biometric data: retina scans, voice signatures, facial geometry
  1. Information identifying personal property: VIN or title number
  1. Technical Asset information: Internet Protocol (IP) or Media Access Control (MAC) addresses that consistently link to a particular person’s technology

What Data Does Not Require PII Security? 

PII security becomes easier if you understand what is not PII data. The examples below are not considered PII data alone as each could apply to multiple people. However, when combined with one of the above examples, the following could be used to identify a specific person:

  • Date of birth
  • Place of birth
  • Business telephone number
  • Business mailing or email address
  • Race
  • Religion
  • Geographical indicators
  • Employment information
  • Medical information
  • Education information
  • Financial information

PII vs PHI vs PCI Data  

PII security

PII data has much in common and some overlap with other forms of sensitive or regulated data such as PHI and PCI, but it is not the same. Confusion often arises around whether PII means information that is identifiable (can be associated with a person) or identifying (associated uniquely with a person, so that the PII actually identifies them). In narrow data privacy rules, such as the Health Insurance Portability and Accountability Act (HIPAA), PII items have been specifically defined. In broader data protection regulations such as the GDPR, personal data is defined in a non-prescriptive principles-based way. Information that might not count as PII under HIPAA could be considered personal data per GDPR.

PHI data is personal health information as defined by the Health Insurance Portability and Accountability Act of 1996. HIPAA provides federal protections for personal health information held by covered entities and gives patients an array of rights with respect to that information. At the same time, HIPAA permits the disclosure of personal health information needed for patient care and other important purposes. This federal law required the creation of national standards to protect sensitive patient health information from being disclosed without the patient’s consent or knowledge. The US Department of Health and Human Services (HHS) issued the HIPAA Privacy Rule to effect the requirements of HIPAA. The HIPAA Security Rule protects a subgroup of information covered by the Privacy Rule. In addition to very clear health information, there is some overlap as when PII data like name, date of birth, and address are tied to personal health information, it is considered PHI as well.  

PCI data stands for “payment card industry” and is defined by a consortium of financial institutions comprising the Payment Card Industry. The definition comes from the rules for protecting data in the PCI-DSS or payment card industry data security standard. The PCI Security Standards Council (SSC) defines “cardholder data” as the full Primary Account Number (PAN) or the full PAN along with any of the following identifiers: cardholder name, expiration date or service code. The rules were implemented to create an additional level of protection for card issuers by ensuring that merchants meet minimum levels of security when they store, process, and transmit cardholder data. 

In the past PCI data might have been considered the most valuable and most at risk because it was related to financial data and could be used to directly access money. However, as many of us have unfortunately learned due to rampant credit card fraud over the last few years, credit card numbers can be easily changed. It’s not nearly as easy to move, change your social security number, or even your name. Those who have dealt with identity theft can understand how devastating it can be when unknown loans or other fraud show up on your credit report. And health information is simply unchangeable as its part of a person’s permanent “life record.” That puts PII data and PHI data in the lead in the race for data value and data risk. PII data might be considered more at risk due to its proliferation so PII security should always be a priority.  

PII Security and the Internet

PII security

Before 1994, very little of our PII data was easily accessible so PII security wasn't as critical. If you wanted someone’s phone number, you had to know their name and have a hefty copy of what we called the “white pages” (a phone book) in order to look them up. Maybe a bank or telephone company had access to thousands of phone numbers, but not the average person. All of that changed with the advent of the Internet. The concept of PII data has become prevalent as information technology and the Internet have made it easier to collect PII. Every online order requires a name and email, not to mention physical address or phone number. This has led to a profitable market in collecting and reselling PII. PII can also be exploited by criminals in stalking or identity theft, or to aid in the planning of criminal acts. In reaction to these threats, many website privacy policies now specifically inform users on the gathering of PII, and lawmakers have enacted a series of regulations to limit the distribution and accessibility of PII making PII security a priority for consumers and companies.  

PII Security Regulations

The era of stringent PII data privacy regulations that required PII security really kicked off with the implementation of the European Union’s General Data Protection Regulation (GDPR) in May 2018. This regulation requires organizations to safeguard personal data and uphold the privacy rights of anyone in EU territory. The regulation includes seven principles of data protection that are required and eight privacy rights that must be enabled. It also gives member state-level data protection authorities the power to enforce GDPR with sanctions and fines. The GDPR replaced a country-by-country patchwork of data protection laws and unified the EU under a single data protection regime. The regulation doesn’t apply to just European companies, however. Any company holding personal data of European citizens must comply.  

The US is further behind the PII privacy regulation game. There is as yet no federal or national privacy regulation that applies across the country. The US is still in the patchwork era with some states like California, Utah, Colorado, Connecticut and Virginia passing state-level regulations. Five more states have introduced regulations. In 2022, a new bipartisan regulation called the American Data Privacy and Protection Act was introduced in the US House of Representatives. It follows the direction of GDPR and would apply to data controllers and accessors. It is effectively a consumer “Bill of Rights” around PII data privacy. The legislation currently sits in the House of Representatives for approval.  

4 Steps to Complete PII Security

These privacy regulations have specific rules around PII security – what data should be protected and how. But in order to comply fully and reduce risk of censure, fees or fines, companies will need to take 4 key steps:

PII security
  1. Data classification: The first step to PII security is to identify sensitive information stored in your company’s databases. This can be done manually by reviewing all the databases and tagging columns or rows that contain PII. Some database solutions allow you to write SQL processes to do this also. However, it’s much faster and less error-prone to utilize an automated solution to find and tag social security numbers, date of birth or other key information wherever it’s located.  
  1. Data access controls: Once PII data is identified controls that allow only approved individuals to access sensitive data should be applied. These controls can include data masking (changing characters to ***) and row or column-level access policies. A common additional requirement is auditable documentation of who has accessed what data and when.  
  1. Data rate limiting: Because it’s best to assume any credentials could be compromised at any time, it’s best to limit the amount of damage even authorized access can do. Instead of allowing millions of lines of data to be downloaded, apply controls that limit the amount of data by role, by location, by time access to reduce the risk of a massive breach.  
  1. Data tokenization: Finally, the most sensitive data should be secured via a data tokenization solution that ensures even if “data” is accessed by a bad actor, they will only get their hands on tokens that are useless to them. The real data is stored in an encrypted token vault.  

Conclusion

The problem of PII security is only on the upswing. As companies extract more insight and value from personal data on consumers, product users and customers, they’ll continue to gather, hold, share and utilize data. In fact, companies are not just collecting data for their own use, but also to monetize it by selling the insights on their own customers to others to glean information from. While data collection and storage are increasing, laws regulating how this data can be stored and used are also increasing. Companies can stay ahead of the curve with processes and solutions to help scale PII security with the growth of PII data.  

As a Snowflake Premier Partner and founding member of the Snowflake Data Governance Accelerated Program, we get a lot of questions about how ALTR is different from other Snowflake data governance solutions, including Snowflake!  

The short answer is that we automate the existing native Snowflake governance features for data masking policies and role-based and row-level access policies. Why is that important? Why Is that valuable? When you automate these Snowflake features, it allows Snowflake users to address some key challenges.  

Bridging the Snowflake Skills Gap

First, you get to the opportunity to address a skills gap. Maybe some of your team members are not as trained up on SnowSQL yet, or they haven't taken all of the Snowflake certification training, especially if you're early in your Snowflake journey. Maybe you and your team don't have time to learn about data masking policies or some of the nuances that come with Snowflake row level policies, and so ALTR can help you automate that in a very simple and easy to use manner. ALTR’s fast SaaS implementation, access via Snowflake Partner Connect, and no-code policy management take the burden off your team and can even allow other data owners throughout the organization to handle data access controls and enforcement.  

Snowflake Data Governance at Scale

The second thing you get to address is deploying these capabilities at scale. We've seen a number of customer projects where implementing these data controls at scale is taking up entire teams of people when it just shouldn’t need that many resources. If you have one centralized tool like ALTR to manage who has access to what data and how much, you take a lot of that kind of scale overhead, and that friction of growing with Snowflake, out of the equation. This comes into play if you set up your Snowflake governance policies the way you want for one database or one account. If you're part of a large organization, you may want to apply that across multiple databases. We recently encountered a company that had nine accounts across all three different cloud providers that Snowflake offers. How do you make that portable across all of those accounts, and all of those deployments? ALTR can make this easy.  

See how ALTR’s features compare to other Snowflake Data Governance solutions

snowflake data governance solutions

A Single Source of Snowflake Data Access Truth

There’s a lot of confusion out there in the market around what “data governance” is exactly. When you’re thinking of other “data governance” solutions for Snowflake like a Collibra or Alation or Immuta or others who are in this space, keep in mind that there are other parts of data governance and many handle some processes like data classification or cataloging, but ALTR’s sole focus is on delivering that single source of truth for data access. You can see how users are using data, control which users have access to which data, reduce risk by limiting the rate of data access, and put powerful tokenization data security over the most sensitive data, all with ALTR.  

Low Cost, Fast Implementation, Enterprise Quality

This kind of Snowflake data governance is a really hard problem for companies to solve, even when they had full teams and full budgets to attack it. But moving into 2023, we’re seeing companies lose headcount and resources and getting very picky about selecting the specific tools to help accomplish specific goals. One of the other major differences between ALTR and some of the other Snowflake data governance solutions is that we’re waging a war on six figure price points for solutions and six months of professional services to implement. If you are being offered or sold a tool that is very expensive and is going to take you a long time to roll out and learn to use, those vendors don't have your best interest in mind. With our pure SaaS platform (which other solutions are notw) we’re making data governance easy to buy, easy to implement, and easy to use. Bottom line: ATLR provides the functionality companies need to govern data at a price point other solutions just can’t touch.  

See how ALTR’s Snowflake Data Governance solution can work for you. Try our Free Plan today.

It’s no secret that Data Governance, PII, and Data Security were among the most talked about topics in Q4 of 2022. Security breaches were rampant and technology teams continued to feel stretched thin, while sensitive data was sometimes unwittingly left unprotected and at risk for exposure. We’ve compiled some key guidance from our partners and industry leaders to help you implement strong data governance in 2023 – from simplifying the definition of data governance, to emphasizing the importance of scalability and automation within your data governance plan.

Alation: Key Insights: Forrester’s New Data Governance Solutions Landscape

This blog, written by John Wills, Field CTO at Alation, takes a look at data governance from a wholistic perspective – explaining the big picture of creating a data governance plan for your organization, while recognizing certain aspects that may vary between companies. Wills teaches us that your company’s data governance solution should exist cross-departmentally and shares that people often miss the mark when their data privacy exists in a vacuum.

Tableau: Keep Your Data Private and Secure

Sheng Zhou, Product Manager at Tableau, writes about the importance of data privacy and protection – specifically from the perspective of protecting and securing PHI to meet HIPPA laws. Zhou shares that, regardless of the type of data you’re protecting, it is a critical business component to be vigilant about securing your sensitive data. Zhou mentions that data governance and data privacy are so important, these processes have to be part of normal and everyday business operations.  

BigID: How Strong Data Governance With BigID Drives Down Privacy Compliance Costs

Peggy Tsai, Chief Data Officer at BigID, discusses how having strong data governance can help drive down your privacy compliance costs, and, at the end of the day, can save your company a lot of money. Tsai begins this blog by explaining certain laws around data privacy (General Data Protection Regulation (GDPR)California Consumer Privacy Act (CCPA) and what can happen to your company when the GDPR requests a Data Subject Access Request (DSAR). Tsai provides an in-depth analysis outlining the importance of strong data governance, so your company can avoid errors, legal fines, and the headache when you receive a DSAR.

Alation: Becoming a Data Driven Organization in 4 Steps

In this blog post, Steve Neat, GM EMEA at Alation walks through tangible steps you can take to ensure your organization is data driven. Neat explains that becoming a data driven organization isn’t just about adding new technologies to your tech stack, it truly requires full investment from all stakeholders. Neat shares how Data Governance plays a huge role in being a data driven organization, by creating processes surrounding where your data is stored and who can access it. Agility is a key factor in data governance – ensuring your organization is in the driver’s seat of protecting your data.

As uncertainty continues to rise in numerous business sectors across the globe, we’re seeing people recognize the need for strong data governance as well. We’re here to help you ensure your organization is ahead of implementing and streamlining a data governance plan. ALTR’s free plan is the perfect place to start - you can automatically discover, classify, and tag sensitive data with a checkbox. It’s easy to get going in less than an hour with no SnowSQL required. 

Have you ever walked into a store and noticed that while some items are displayed freely on shelves, some are visible, yet locked behind glass? We can guess that those items are higher quality, higher value, higher risk. It's pretty clear when inventory comes into the store which items fit this category. It can be less clear when data comes into your database. That's where data classification can help.

In this blog post, we’ll explain what data classification is, why data classification is an important step in your data security strategy, and how you would classify data yourself with SQL versus doing it automatically with ALTR.

What is Data Classification?

Data classification is the process of identifying the type of information contained in each column of your database and categorizing the data. This is an important first step to securing your data. Once you know the type of information contained in each column, you will be able to compare the type to a list of information types that your business considers sensitive. This in turn will make it possible to protect that data with appropriate data access policies. Before you create a column-level policy, you should classify it. By implementing data classification, you can minimize the risk of a sensitive data compromise.

Data Classification Factors

To protect your company’s sensitive data, you must first know what type of data you have. Therefore, data classification is a must to avoid having your data hacked by cybercriminals or leaked by individuals inside your business. To determine how to apply data classification consider the following factors:

  • Timing: In order to enforce a data policy, you must know which columns contain sensitive data.  So, you need to classify your data before implementing data access policies. You should also reclassify any time you add new sources of data.
  • Methods: The method you use should involve sampling actual data values found in the data. Avoid relying completely on the name of the column.
  • Automation: Classification can be tedious when done manually. A typical database will have hundreds if not thousands of tables, and each table can have hundreds of columns giving rise to missed columns and errors in copy/pasting results.
  • What Data is Sensitive: Have a list of the information types that are sensitive in your situation. For example, what data security regulations apply to your company, what does your internal data security team require, and so on.

These factors will help to ensure that your data classification efforts are efficient and thorough.

How Snowflake Data Classification Works DIY

Read on to learn what’s required to classify data in Snowflake yourself with SQL via three different methods: good, better and best.

Who Can Do It: A software developer who can manually write SQL code AND categorize and manage data well

Downsides to manually classifying data in Snowflake:

  • Time-consuming
  • Higher risk of missing data that needs to be classified
  • You’ll have to manually store your results in a database, making it difficult for non-technical users to analyze the results 

1) “Good” Method: Column Name

This is a way to identify what type of data is in a column by looking at the column name. You can run a query that uses a conditional expression for each data type against the information schema inside of Snowflake.

The query result will display every column of data that matches your condition in your Snowflake account. The downsides are that you must run the query for every data type you want to identify, and you might miss columns that need to be identified if they weren’t named clearly. For example, if you’re trying to identify all columns of ‘email’ but it’s abbreviated as ‘eml,’ then it won’t be returned in your query.

Snowflake data classification
Figure 1. Column name query
Snowflake data classification
Figure 2. Query results

2) “Better” Method: Sample Rows of Data

This is better than the column name method because it will grab a sample of rows and then you can clearly see the content of each column. However, it’s still not the ‘best’ approach. Because the query will display multiple rows and column values for you to view, this can be time-consuming and overwhelming.

Snowflake data classification
Figure 3. Sample Row query
Snowflake data classification
Figure 4. Query results

3) “Best” Method: Extract semantic categories

This data categorization method is the best one because it does the sampling for you. You can run extracted categories from a table and a JSON object with scored classification results will be generated in the query result. The caveats are that you must run this across each table in your database, and you must manually store and present results to use them to create access policies

Snowflake data classification
Figure 5. Extract semantic categories query

Snowflake data classification
Figure 6. Query results (based on a Birthdate category) in the form of a JSON file

Snowflake data classification
Figure 7. Detailed view of the ‘birthdate’ query results

How Snowflake Data Classification Works in ALTR

While you could choose one of the ‘good, better, and best’ approaches above to classify your data manually in Snowflake, using ALTR to automate data classification is the ‘supreme’ approach.

Who can do it: Anyone can do it and you don’t have to write SQL or log in to Snowflake.

Downsides to classifying data in ALTR: None

There are only four steps to ALTR Snowflake data classification.

  1. Simply choose the database that you’d like to classify (shown in figure 8).
  2. Check the box beside Tag Data by Classification.
  3. Choose from the available tagging methods.
  4. Click on Update. This starts the process of classifying all the tables in that database. When the job is complete, you’ll receive an email to let you know it’s done.

NOTE: An object tag is metadata (such as a keyword or term) that is assigned to a piece of information as a description of it for easier searchability. ALTR can use object tags assigned to columns in Snowflake to classify data or, if those or not available, ALTR can assign tags to columns using Google DLP classification.

The classified data results will be integrated into a Data Classification Report.

Snowflake data classification
Figure 8. ALTR User Interface

Snowflake Data Classification Use Cases

Here are a couple of use case examples where ALTR’s automated data classification capability can benefit your business as it scales with Snowflake usage.

Use Case 1. Protected health information

Your data team is integrating an employee dataset from a recently acquired hospital into your main data warehouse in Snowflake. You need to determine which database columns have healthcare-related data in them (e.g., social security numbers, diagnostic codes, etc.,). The original developers of the dataset are no longer available, so you use ALTR to classify the dataset to identify those sensitive columns.

Use Case 2. Financial records information from sales

You are a healthcare product manufacturer and you have just signed a new online reseller for your products.  The resellers sales data will be dumped into your Snowflake database every week and will contain sales transaction data including addresses, phone numbers, and payment information; however, you don't know where this data is located in the database.  

What You Could be Doing: Automating Snowflake Data Classification with ALTR

In today’s world, implementing data classification as part of your company’s security strategy is critical. You can’t afford to put your company at risk of fines and lawsuits due to data breaches that could’ve been prevented. Do you or your security team have hours in a day to spend manually writing SQL code each time that you add data to your databases? Do you want to spend hours trying to figure out why a query didn’t generate any results due to unclear column names or other issues? We’ve made using ALTR such a convenience that you don’t even have to write any SQL code or log into Snowflake! It’s a simple point-and-click four-step procedure in ALTR and you’re done!

Watch the ‘how-to’ comparison video below to see what it looks like to manually classify Snowflake data versus automating it with ALTR.

Ready to give it a try? Start with our Free Plan today

Get the latest from ALTR
Subscribe below to stay up to date with our team, upcoming events, new feature releases, and more.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.