ALTR Blog

The latest trends and best practices related to data governance, protection, and privacy.
BLOG SPOTLIGHT

Data Security for Generative AI: Where Do We Even Begin?

Navigating the chaos of data security in the age of GenAI—let’s break down what needs to happen next.
Data Security for GenAI

Browse All

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

If we’ve learned anything over the last few years, it’s that this data space moves faster than you can imagine. Whether it’s new investments from market leaders, new acquisitions, new partnerships, or new technologies, the landscape is always changing, and those who aren’t ready for the next big shift are quickly left behind.  

James Beecham
James Beecham and Dave Sikora at BlackHat 2022

We anticipated this when we built the ALTR platform from the cloud up to be highly adaptable – our solution can easily scale up or scale down with users, with data, with cloud data warehouse usage. While our competitors were offering legacy on-prem solutions with high barriers to entry like long term commitments, massive up-front costs and complicated implementations, ALTR built a cloud-native, SaaS-based integration for Snowflake that users could add directly from Snowflake Partner Connect and a free plan that lets companies try our solution before ever paying a cent. Our decisions have paid off in market response, demonstrated by compounded annual revenue growth of over 300% since 2018 and an accelerating customer base of over 200 companies.

We couldn’t be more ready for the next phase in ALTR’s journey and it’s the perfect time to appoint a new leader to take it on: James Beecham, ALTR’s Co-founder and Chief Technology Officer has been promoted to become ALTR’s next Chief Executive Officer. As a Co-founder, James was key to identifying the data security hole ALTR could fill. As CTO, he has been the technical leader who envisioned how ALTR could best meet our customers’ needs and one of the most public faces of the company.  

James is excited to chart the course for ALTR’s future, maintaining the company’s trajectory by ensuring we continue to anticipate, act proactively, and deliver the disruptive data governance and security solutions our customers and the market didn’t even realize were possible. We to believe that ALTR’s short “time-to-value" in a market that is fraught with complexity will deliver sustaining differentiation in the coming years.  

And we’re a team here at ALTR so Dave isn’t going anywhere. He and James will work closely together during a transition period, and he will remain involved as a Board Director, CEO Advisor and ongoing financial Investor. Dave will also use this opportunity to expand his strategic advisory practice, mentor up-and-coming CEOs and explore other Board of Director opportunities.  

Please don’t hesitate reach out to James, Dave or your Account Executive if you have any questions about the transition. And stay tuned for great things ahead…

- Dave & James

If there’s one phrase we heard over and over again at Snowflake Summit 2022 (other than “data governance”) it was "data mesh." What is data mesh, you ask? Good question!

Data Mesh definition

Data mesh is a decentralized data architecture to make data available through distributed ownership of data. Various teams own, manage and share data as a product or service they offer to other groups inside the company or without. The idea is that distributing ownership of data (versus centralizing it in a data warehouse or data lake with a single owner, for example) makes it more easily accessible to those who need it, regardless of where the data is stored.

Data Mesh

You can imagine why this might be a hot topic in the data ecosystem. Companies are constantly looking for ways to make more data available to more users more quickly. The data mesh conversation has continued in data ecosystem leader blogs we’ve gathered in our Q3 roundup.

Alation: Fabrics, Meshes & Stacks, oh my! Q&A with Sanjeev Mohan

VP Product Marketing and Analyst Relations at Alation, Mitesh Shah, interviews former Gartner Analyst Sanjeev Mohan in this Q&A-style blog. Mohan shares his definitions of data mesh, data fabric and the modern data stack and why they’re such hot topics at the moment. Mohan suggests the possibility that new terms (like data mesh) are actually history repeating itself, dives into what these new strategies and architectures bring to the table for data-first companies and identifies the pros and cons of centralizing or decentralizing data and metadata.  

Collibra: Data Observability Amidst Data Mesh

Eric Gerstner, Data Quality Principal, Collibra leverages his background as a former Chief Product Owner managing technology for digital transformation to dive into the data mesh concept. He explains that “No amount of technology can solve for good programmatics around the people and process.” He sees data mesh as a conceptual way of tying technology to people and processes and enabling an organization to improve its data governance. This article helps to shed light on the narrative of data mesh and how it fits into modern data organizations in both the immediate and further-out futures. He sees data mesh as key to linking people and processes – people that know how to interpret and organize data and the processes that drive and collect data into the organization itself.

Matillion: Data Mesh with Matillion

This blog by Matillion really unpacks the concept of data mesh at a fundamental level. It’s really about bringing data out from its usual role as a supporting player and elevating it to a product in and of itself. It’s about “productizing” data and offering it to customers within and without the company. Customers have an expectation of the quality of the product and the service they are utilizing. A data mesh can help data owners meet those expectations. Furthermore, this blog explains the steps necessary to create a data mesh with Matillion. Matillion’s low-code/no-code platform is an ideal partner for individual data teams that include a mix of domain and technology expertise.

Data Mesh Architecture: ALTR's Take

We’re all about making data easier to access – for authorized people. As the data mesh architecture proliferates, companies need to ensure that all data owners across the company are enabled with the appropriate tools in place to keep their sensitive data from spreading recklessly – to meet both internal guidelines and government regulations on data privacy. A data mesh architecture really democratizes data ownership and access, and ALTR’s no-code, low up-front cost solution democratizes data governance to go hand in hand with it. Data owners from finance to operations to marketing do not need to know any code to implement data access controls on the sensitive data they’re responsible for.  

Snowflake harnesses the power of the cloud to help thousands of organizations explore, share, and unlock the actual value of their data. Whether your company has ten employees or 10,000, if you’re one of Snowflake’s 4,500 customers and counting, you’re either thrilled or overwhelmed by the cloud data warehouse’s combination of out-of-the-box functionality and powerful, flexible features.

Wherever you are in your journey, though, it’s never too early or too late to think about how you’re handling Snowflake data governance and security for sensitive data like PII/PHI/PCI.  

When you look at the enterprise-level security and governance capabilities Snowflake offers natively within the platform, you may wonder why you need more (see the Bonus question for this answer). And the options for Snowflake Data Governance offered by partners may sound similar, making it a challenge to know what the differences are and what you need.  

With that in mind, we’ve put together the critical questions you should ask when evaluating Snowflake Data Governance options. Going through this list should reveal the best next step for your company.

Snowflake Data Governance

1. Is the Snowflake data governance solution easy to set up and maintain? Does it use Proxy, Fake SaaS or Real SaaS?

There are several ways vendors can enable their Snowflake data governance solutions. One approach is to utilize a proxy. While proxy solutions have some advantages, they come with serious issues that make them less than ideal for cloud-based Snowflake:

  • Extra effort is required to make all applications go through the proxy, adding time, complexity, and costs to your implementation.  
  • Security holes are created when applications and users can bypass the proxy to get full access to data, increasing risk and surfacing compliance issues  
  • Platform changes may break the proxy without warning, adding unnecessary downtime and delays  
  • On-premises proxies require you to deploy, maintain, and scale more infrastructure than you would with a pure-SaaS Cloud-native solution

SaaS is a better option for Snowflake data governance, but some providers calling themselves “SaaS” are better defined as "Managed Services." In these “Fake SaaS” solutions, vendors spin up, support and update an individual version of the software just for you. This makes it more expensive to run and maintain than true SaaS, costing you more. They can also require long maintenance windows that make the service unavailable during updates.

A proper multi-tenant SaaS-based data governance solution built for the cloud - like ALTR’s - is easier to start and maintain with Snowflake. There’s no hardware deployment or maintenance downtime required, no hardware sitting between your users and the data, no risk of a platform change breaking your integration, and no difficulty scaling your Snowflake usage. Because it’s natively integrated, there are no privacy issues or security holes. A real SaaS-based solution will also have the credentials to back it up: PCI DSS Level 1, SOC 2 Type II certification, and support for HIPAA compliance.

Snowflake Data Governance

2. Is the Snowflake data governance solution easy to use? Does it require code to implement and manage?

Snowflake provides the foundation with native data governance features like sensitive data discovery and classification, access control and history, masking, and more with every release. But for users to take advantage of these Snowflake data governance capabilities on their own, they must be able to write SQL. That can make the features difficult, time-consuming, and costly to implement and manage at scale because data governance administration is limited to DBAs and other developers who can code.

However, the groundwork Snowflake provides allows partners to create solutions that leverage that built-in functionality but deliver an easier-to-use experience. ALTR’s solution provides native cloud integration and a user interface that doesn’t require code to get started or manage. This means your Data Governance teams or even line of business data or analytics users can take over the management of governance policies on Snowflake, freeing DBAs to focus on managing data streams and enabling data-driven insights.

Snowflake Data Governance

3. It is a complete Snowflake data governance solution? Does it secure all of your data and reduce your risk?

This is crucial. You may look for a Snowflake Data Governance solution in response to privacy regulations, but you’ll never be truly compliant without a data security. And most "data governance" options don’t include data protection. While Snowflake offers many enterprise-level security features, there’s no defense against credentialed or privileged access threats. Once someone gets access with compromised credentials, there’s no mechanism for slowing or stopping data consumption.

Some software vendors calling themselves “data governance” only provide data discovery and classification – a data card catalog – without access control. And some other vendors require the data you want to protect to be copied into a new Snowflake database managed by the solution, leaving the raw data in the original database—ungoverned and unprotected. You may never know if anyone has accessed that data, potentially violating privacy regulations that require you to understand and document who has accessed data, even if nothing leaks outside the company.

For complete Snowflake Data Governance, you must not only be able to find and classify your data, but see data access, utilize consumption thresholds to detect anomalies and alert on them, respond to threats with real-time blocking, and tokenize critical data at rest. ALTR combines all these features into a single data governance and security platform that allows you to protect data appropriately based on data governance policies, ensure all your data is secure, and minimize your risk of data loss or theft.

Snowflake Data Governance

4. Is the data governance solution affordable and flexible? Can you start with only what you need?

Most solutions cost $100k to $250k per year to start! These large, legacy on-premises platforms were not built for today’s scalable cloud environment. They require considerable time, resources, and money to even get started, which is an odd fit for Snowflake’s cloud-based platform, where Snowflake On-Demand gives you usage-based, per-second pricing with a month-to-month contract.

ALTR’s pricing starts at “free.” Our Free plan gives you the power to understand how your data is used, add controls around access, and limit your data risk at no cost. Our Enterprise and Enterprise Plus plans are available if you need more advanced governance controls, integration with SOAR or SIEM platforms, or increased data protection and dedicated support.

ALTR’s tiered pricing means there’s no large up-front commitment—you can start Snowflake data governance for free and expand if or when your needs change. Or stay on our free plan forever.

Snowflake Data Governance

Bonus Question: Can't I just build a solution myself? 

While a data admin can write Snowflake masking policy using SQL to leverage Snowflake's native features, what happens next? That is a one-time point fix but what about the long term and wide scale? Can others read and work with it? Do you have a QA team to eliminate errors? Can you ensure it scales correctly and can run quickly across thousands of databases? Do you have the time to integrate it with Okta or Matillion or Splunk? Do you have a roadmap that ensures it stays up to date with new private-preview Snowflake features, keeping up with your changing data and regulatory landscape, and addressing new user service needs? Basically, do you want your data team to be a software development team? You could hire 30 engineers and spend millions of dollars to build enterprise-ready Snowflake data governance software you can trust with the risky connection between users and data, but why should you when there are already cost-effective solutions from companies in the market focused on just this?

Conclusion

Companies flocking to the cloud data party, and Snowflake in particular, are faced with a dizzying array of options for Snowflake Data Governance. However similar the solutions may seem, with a little digging fundamental differences become apparent. ALTR’s solution stands out for its accessible, SaaS-based, no-code setup and management and complete Snowflake data governance and security feature set. And with its reasonable user- and data-based costs, ALTR becomes the obvious next step for Snowflake users to govern and protect their sensitive data.

What is Cloud Data Security? A Definition

Why is everyone talking about cloud data security today? The first wave of digital transformation focused on moving software workloads to SaaS-based applications in the cloud that were easy to spin up, required no new hardware or maintenance, and started with low costs that scaled with use. Today, the next generation of digital transformation is focused on moving the data itself — not just from on-premises data warehouses to the cloud but from other cloud-based applications and services into a central cloud data warehouse (CDW) like Snowflake. This consolidates valuable and often very sensitive data into a single repository with the goal of creating a single source of truth for the organization.

Cloud data security is focused on protecting that sensitive data, regardless of where it’s located, where it’s shared or how it’s used. It uses role-based data access controls, privacy safeguards, encrypted cloud storage, and data tokenization among other tools to limit the data users can access in order to meet data security requirements, comply with privacy regulations and ensure data is secure.

3 Benefits of Cloud Data Security

Cloud data security confers powerful benefits with almost no downsides. In fact, the biggest risk of cloud data security is not doing it.

  • Improve business insights: When applied correctly, cloud data security enables data to be distributed securely throughout an organization, across business units and functional groups, without fear that data could be lost or stolen. That means you can share sensitive PII about customers with your finance teams, your marketing teams and even your sales teams, without worry that the data might make its way outside the company. You can gather information from various in-house and in-cloud business tools such as Salesforce or another CRM, your ERP, or your marketing automation solution into one centralized database where users can cross check and cross reference information across various data sources to uncover surprising insights.
  • Avoid regulatory fines: It’s not just credit card numbers or health information that companies need to worry about anymore – today, practically every company deals with sensitive, regulated data. Personally Identifiable Information (PII) is data that can be used to identify an individual such as Social Security number, date of birth or even home address. It’s regulated by GDPR in Europe and by various state regulations in the US. Although the regulatory landscape is still patchy in the US, all signs point to a federal level statute or new regulation that will lay out rules for companies across the country coming very soon. For companies that want to get ahead of the issue, making sure their cloud data security meets the most stringent requirements is the easiest path. This can help a company ensure its meeting its obligations and reduce risk of fines from any regulation.
  • Cultivate customer relationships: In a 2019 Pew Research Center study 81% of Americans said that the risks of data collection by companies can outweigh the benefits. This might be because 72% say they benefit very little or not at all from the data companies gather about them. A McKinsey survey showed that consumers are more likely to trust companies that only ask for information relevant to the transaction and react quickly to hacks and breaches or actively disclose incidents. These also happen to be some of the requirements of data privacy regulations – only gather the information you need and be upfront, timely and transparent about leaks. Companies can’t continue to gather data at will with no consequences – customers are awake to the risks now and demanding more accountability. This gives organizations a chance to strengthen the relationship with their customers by meeting and exceeding their expectations around privacy. If personalization creates a bond with customers, imagine how much more powerful that would be if buyers also trust you. Organizations that focus on protecting customer data privacy via a future-focused data governance program have an opportunity to take the lead in the market.
Cloud Data Security

Cloud Data Security Challenges

Although cloud data security is a new area of concern, many of the biggest challenges are already well known by companies focused on keeping data safe.

  1. Securing data in infrastructure your company doesn’t own: With so much data moving to the cloud, yesterday’s perimeter is an illusion. If you can’t lock data down behind a firewall, and guess what, you can’t, then you’re forced to trust your cloud data warehouse. These facilities are extremely secure, but they only cover part of your security needs. They don’t manage or control user data access – that’s left to you. Bad actors don’t care where the data is – in fact, cloud data warehouses that consolidate data from multiple sources into a single store make a compelling target. Regulators don’t care where data is either when it comes to responsibility for keeping it safe: it’s on the company who collects it. Larger companies in more regulated industries face very punitive fines if there’s a leak—which can lead to severe consequences for the business.
  2. Securing data your team doesn’t own: From a security perspective, it’s difficult to protect data if you don’t know what it is or where it is. With various functional groups across companies making the leap to cloud data warehouses on their own in order to gain business insights, it’s difficult for the responsible groups such as security teams to be sure data is safe.
  3. Stopping privileged access threats: When sensitive data is loaded to a CDW there’s often one person who doesn’t really need access, but still has it: your Snowflake admin. If your company is like Redwood Logistics, uploading sensitive financial data in order to better estimate costs, you really don’t want your admin to have access – and usually, he doesn’t either! Even if you trust your admin and you probably do, there’s no guarantee his credentials won’t get stolen and no upside to him or the business to allowing that access. This leads into our next challenge:
  4. Stopping credentialed access threats: Even the most trustworthy employees can be phished, socially engineered or plain have their credentials stolen. Despite the training companies have done to educate users about these risks, the credentialed access threat continues to be one of the top sources of breach in the Verizon Data Breach Investigations Report, for the sixth year in a row! ALTR’s James Beecham asks year after year: “Why Haven’t We Stopped Credentialed Access Threats?” We know how – even when humans are fallible there is technology that can help.
  5. Using data safely in Business Intelligence tools: One of the key goals to consolidating data into a centralized CDW is to enable business intelligence access. BI tools like Tableau, ThoughtSpot and Lookr depend on access to all available data in order to provide a full 360 view of the business.  When the data can’t be utilized securely in these tools, it often results in security admins making the call to leave that data out of the equation, creating a broken view of the business.
Cloud Data Security

Cloud Data Security Best Practices

There are a few best practices every organization should incorporate into their successful cloud data security program:

   1. Keep your eye on the data - wherever it is

This shift to the cloud really requires a shift in the security mindset: from perimeter-centric to data-centric security. It means CISOs (Chief Information Security Officer) and security teams will have to stop thinking about hardware, data centers, firewalls, and instead focus on the end goal: protecting the data itself. Responsible teams need to embrace data governance and security policies around data throughout the organization and its data ecosystem. They need to understand who should have access to the data, understand how data is used, and place relevant controls and protections around data access. In fact they could start with a data observability program in order to understand what normal data usage looks like so they're better able to identify abnormal.

   2. Empower everyone to secure cloud data

We often hear “security is everyone’s responsibility.” But how could it be when most are left out of the process? While data is a key vulnerability for essentially every company, until recently most companies didn’t want to acknowledge the risk. Now, with a new data breach announcement every few weeks, the problem is impossible to ignore. When marketing teams are using shadow cloud data warehouse resources instead of waiting for security or IT teams to vet the solution for security requirements, it’s easier to make sure data owners have the means to protect the data themselves. Instead of governance technologies based on legacy infrastructure that not only require big investments in time, money, and human resources to implement, but also expensive developers to set up and maintain, democratize data governance with tools that allow non-coders to rollout and manage the data security solution themselves in weeks or even days.

   3. Add cloud data security checks and balances to your cloud data warehouse

To protect data (and your Database Administrator!) from the risk of sensitive data, put a neutral third party in place that can keep an eye on data access - natively integrated into to the cloud data platform yet outside the control of the platform admin. This separation of duties should make it impossible to access the data without key people being notified and can limit the amount of data revealed, even to admins. It can include features like real time alerts that notify relevant stakeholders at the company whenever the admin (or any user for that matter) tries to access the data. If none of the allowed users accessed the data, they’ll know unauthorized access has occurred within seconds. Alert formats can include text message, Slack or Teams notifications, emails, phone calls, SIEM integrations, etc. Data access rate limits that constrain the amount of de-tokenized data delivered to any user, including admins, also limit risk. While a user can request 10 million records, they may only get back 10,000 or 10 per hour. This can also trigger an alert to relevant stakeholders. These features ensure that no single user has the keys to the entire data store – no matter who they are.  

   4. Always assume credentials are compromised and cloud data is at risk

Knowing that the easiest and best ways to stop credentialed access threats are undermined by people being people, we’re simply better off assuming all credentials are compromised. Stolen credentials are the most dangerous if, once an account gets through the front door, it has access to the entire house including the kitchen sink. Instead of treating the network as having one front door, with one lock, require authorization to enter each room. This is actually Forrester’s “Zero Trust” security model – no single log in or identity or device is trusted enough to be given unlimited access. This is especially important as more data moves outside the traditional corporate security perimeter and into the cloud, where anyone with the right username and password can log in. While cloud vendors do deliver enterprise class security against cyber threats, credentialed access is their biggest weakness. It’s nearly impossible for a SaaS-hosted database to know if an authorized user should really have access or not. Identity access and data control are still up to the companies utilizing the cloud platform.  

Cloud Data Security

Key Components of Cloud Data Security Solutions

An effective cloud data security solution includes these key components:

  • Knowing where your data is and categorizing what data is sensitive: With data often spread throughout an organization’s technology stack, it can be challenging to even know all the various places sensitive data like social security numbers are stored. Solving this issue often starts with a data discovery and data classification solution that can find data across stores, group information into types of data and apply appropriate tags.
  • Controlling access to sensitive data: In today’s data-driven enterprises, data is not just used by data scientists. Everyone from marketing to sales to product teams may need or want access to sensitive data in order to make more informed business decisions but not everyone will be authorized to have access to all the data. Making sure you have the ability to grant access to some users but not others, or allow access to some roles but not others, in an efficient, scalable and secure way is one of the most important components of cloud data security.
  • Putting extra limits on sensitive data access: Data security doesn’t have to be either/or. With data access rate limits, users can be prohibited from gaining access to more data than they should reasonably need. This can stop bad actors with credentials from downloading the whole database by setting rate limits per user or per time period, ex: 10,000 records in 1 hour vs 1M.  
  • Securing sensitive data with encryption or tokenization: Encryption is one cloud data security approach that is highly recommended by security professionals. However, it does have weaknesses and limitations when it comes to utilizing data in the cloud. Tokenization can enable data to be stored securely yet still be available for analysis.

Conclusion

There’s no chance of reversing the migration of data to the cloud and why would we want to? The benefits are so staggering, it’s well worth any challenges presented. As long as cloud data security is built in as a priority from the start, risks can be mitigated, and the full power and possibility of a consolidated Cloud Data Warehouse can come to fruition.  

See how ALTR can help automate and scale your cloud data security in Snowflake. Get a demo!  

The road to becoming one of today’s data-driven companies is full of challenges, not the least of which is finding and keeping the right people with the right skills to get you there. And it’s not always about the individual, it’s also about the team and finding the right combined characteristics that lead to success.

As part of our Expert Panel Series, we asked some experts in the modern data ecosystem what attributes a modern data team should have. Here’s what we heard…

John DaCosta, Sr. Sales Engineer, Snowflake:

“I have been referencing this McKinsey article for years now. ‘A two-speed IT architecture for the digital enterprise.’ The concept is an Enterprise IT organization (Data Platform / Networking, etc.) that manages mature assets / processes. The 2nd speed teams are smaller, agile and more focused. They focus on "shadow it,” for example: Marketing Analytics / Marketing Technology. They are allowed to do whatever they need to get the job done. But once things are mature, they can be transitioned into Enterprise IT. In my interpretation, functional areas have their own ‘smaller technology teams’ that have all the required skill sets to deliver on projects for the business unit sponsoring it.”

Phil Warner, Director of Data Engineering, PandaDoc: 

“Hire T-shaped skillsets and people who are happy to collaborate. Nothing is worse than a team of siloed individual contributors. [People with T-shaped skills] are team members who specialize in a particular area (such as Python, or data modeling, or infrastructure, etc.), but also have all-round skills, to a lesser degree, across the board. This allows for broad coverage across the team, without having to train everyone on everything to the same level, and also gives you team members that'll never say 'that's not my job', or sit there and pout when a particular ETL process didn't get written in Python this time around. They also tend to be inquisitive and curious by nature, and so are open to new ways of doing things and new technologies to move things forward, rather than painting themselves into a box and refusing to do anything other than what they know.

The opposite of a person with a T-shaped skillset is a one-trick pony. 😁”

Louis Hassel, Account Executive, Alation: 

“A modern team should have a variety of skills but the best attribute they can have is a shared vision of the overall goal of the data project. If the Marketing manager needs hourly reports and the data engineering team is building daily extracts there is a disconnect. The data exec level would be great, but not a necessity. Just need to do a little planning to succeed rather than rebuild everything.”

James Beecham, Founder & CTO, ALTR:

“Similar to a full-stack developer, or a ‘feature team’ for software development, having team members that are cross functional is key to accelerating your data initiatives. I have seen too many projects stall because one person says, ‘I don’t know anything about the data pipeline, so I cannot tell you the answer’ or ‘I don’t have access to the data so I cannot verify that classification report.’ These types of bottlenecks always pop up at the worst time and cause delays. Cross training team members, having folks who are not afraid of using every tool you have in your stack is critical to your success.”  

Watch out for the next monthly installment of our Expert Panel Series on LinkedIn!

What is Data Tokenization? – a Definition

You may be familiar with the idea of encryption to protect sensitive data, but maybe the idea of tokenization is new. What is data tokenization? In the realm of data security, “tokenization” is the practice of replacing a piece of sensitive or regulated data (like PII or a credit card number) with a non-sensitive counterpart, called a token, that has no inherent value. The token maps back to the sensitive data through an external data tokenization system. Data can be tokenized and de-tokenized as often as needed with approved access to the tokenization system.

How Does Tokenization of Data Work?

Original data is mapped to a token using methods that make the token impractical or impossible to restore without access to the data tokenization system. Since there is no relationship between the original data and the token, there is no standard key that can unlock or reverse lists of tokenized data. The only way to undo tokenization of data is via the system that tokenized it. This requires the tokenization system to be secured and validated using the highest security levels for sensitive data protection, secure storage, audit, authentication and authorization. The tokenization system is the only vehicle for providing data processing applications with the authority and interfaces to request tokens or de-tokenize to the original sensitive data.

Replacing original data with tokens in data processing systems and applications like business intelligence tools minimizes exposure of sensitive data in those applications, stores, people and processes, reducing risk of compromise, breach or unauthorized access to sensitive or regulated data. Applications, except for a handful of necessary applications or users authorized to de-tokenize when strictly necessary for a required business purpose, can operate using tokens instead of live data,. Data tokenization systems may be operated within a secure isolated segment of the in-house data center, or as a service from a secure service provider.

What is Data Tokenization Used For?

Tokenization may be used to safeguard sensitive data including bank accounts, financial statements, medical records, criminal records, driver's licenses, loan applications, stock trades, voter registrations, and other types of personally identifiable information (PII). Data tokenization is most often used in credit card processing, and the PCI Council defines tokenization as "a process by which the primary account number (PAN) is replaced with a surrogate value called a token. De-tokenization is the reverse process of redeeming a token for its associated PAN value. The security of an individual token relies predominantly on the infeasibility of determining the original PAN knowing only the surrogate value".

The choice of tokenization as an alternative to other data security techniques such as encryption, anonymization, or hashing will depend on varying regulatory requirements, interpretation, and acceptance by auditing or assessment entities. We cover the advantages and disadvantages of tokenization versus other data security solutions below.

Benefits of Data Tokenization

When it comes to solving these cloud migration challenges, tokenization of data has all the obfuscation benefits of encryption, hashing, and anonymization, while providing much greater usability. Let’s look at the advantages in more detail. 

  1. No formula or key: Tokenization replaces plain-text data with an unrelated token that has no value if breached. There’s no mathematical formula or key; a token vault holds the real data secure.    
  1. Acts just like real data: Users and applications can treat tokens the same as real data and perform high-level analysis it, without opening up the door to risk of leaks or loss. Anonymized data on the other hand provides only limited analytics capability because you’re working with ranges, while hashed and encrypted data are ineligible for analytics. With the right tokenization solution, you can share tokenized data from the data warehouse with any application, without requiring data to be unencrypted and inadvertently exposing it to users.   
  1. Granular analytics: Retaining the connection to the original data enables you to dig deeper into the data with more granular analytics than anonymization. Anonymized data is limited by the original parameters, such as age ranges or broad locations, which might not provide enough details or flexibility for future purposes. With tokenized data, analysts can create fresh segments of data as needed, down to the original, individual street address, age or health information. 
  1. Analytics plus protection: Tokenization delivers the advantages of analytics with the strong at-rest protection of encryption. For the strongest possible security, look for solutions that limit the amount of tokenized data that can be de-tokenization and also issue notifications and alerts when data is de-tokenized so you can ensure only approved users get the data. 

Tokenization Vs. Encryption

1. Tokens have no mathematical relationship to the original data, which means unlike encrypted data, tokenized data can’t be broken or returned to their original form.

While many of us might think encryption is one of the strongest ways to protect stored data, it has a few weaknesses, including this big one: the encrypted information is simply a version of the original plain text data, scrambled by math. If a hacker gets their hands on a set of encrypted data and the key, they essentially have the source data. That means breaches of sensitive PII, even of encrypted data, require reporting under state data privacy laws. Tokenizing data, on the other hand, replaces the plain text data with a completely unrelated “token” that has no value if breached. Unlike encryption, there is no mathematical formula or “key” to unlocking the data – the real data remains secure in a token vault.

2. Tokens can be made to match the relationships and distinctness of the original data so that meta-analysis can be performed on tokenized data.

When one of the main goals of moving data to the cloud is to make it available for analytics, tokenizing the data delivers a distinct advantage: actions such as counts of new users, lookups of users in specific locations, and joins of data for the same user from multiple systems can be done on the secure, tokenized data. Analysts can gain insight and find high-level trends without requiring access to the plain text sensitive data. Standard encrypted data, on the other hand, must be decrypted to operate on, and once the data is decrypted there’s no guarantee it will be deleted and not be forgotten, unsecured, in the user’s download folder. As companies seek to comply with data privacy regulations, demonstrating to auditors that access to raw PII is as limited as possible is also a huge bonus. Data tokenization allows you to feed tokenized data directly from Snowflake into whatever application needs it, without requiring data to be unencrypted and potentially inadvertently exposed to privileged users.

3. Tokens maintain a connection to the original data, so analysis can be drilled down to the individual as needed.

Anonymized data is a security alternative that removes the personally identifiable information by grouping data into ranges. It can keep sensitive data safe while still allowing for high-level analysis. For example, you may group customers by age range or general location, removing the specific birth date or address. Analysts can derive some insights from this, but if they wish to change the cut or focus in, for example looking at users aged 20 to 25 versus 20 to 30, there’s no ability to do so. Anonymized data is limited by the original parameters which might not provide enough granularity or flexibility. And once the data has been analyzed, if a user wants to send a marketing offer to the group of customers, they can’t, because there’s no relationship to the original, individual PII.

Three Risk-based Models for Tokenizing Data in the Cloud

Depending on the sensitivity level of your data or comfort with risk there are several spots at which you could tokenize data on its journey to the cloud. We see three main models - the best choice for your company will depend on the risks you’re facing:

Level 1: Tokenize data before it goes into a cloud data warehouse

  1. The first issue might be that you’re consolidating sensitive data from multiple databases. While having that data in one place makes it easier for authorized users, it might also make it easier for unauthorized users! Moving from multiple source databases or applications with their own siloed and segmented security and log in requirements to one central repository gives bad actors, hackers or disgruntled employees just one location to sneak into to have access to all your sensitive data. It creates a much bigger target and bigger risk.  
  1. And this leads to the second issue: as more and more data is stored in high-profile cloud data warehouses, they have become a bigger focus for bad actors and nation states. Why should they go after Salesforce or Workday or other discrete applications separately when all the same data can be found in one giant hoard?  
  1. The third concern might be about privileged access from Snowflake employees or your own Snowflake admins who could, but really shouldn’t, have access to the sensitive data in your cloud data warehouse.  
data tokenization

If your company is facing any of these situations, it makes sense for you to choose “Level 1 Tokenization”: tokenize data just before it goes into the cloud. By tokenizing data that is stored in the cloud data warehouse, you ensure that only the people you authorize have access to the original, sensitive data.

Level 2: Tokenize data before moving it through the ETL process

As you’re mapping out your path to the cloud, you may want to make sure data is protected as soon as it leaves the secure walls of your datacenter. This is especially challenging for CISOs who’ve spent years hardening the security of perimeter only to have control wrested away as sensitive data is moved to cloud data warehouses they don’t control. If you’re working with an outside ETL (extract, transform, load) provider to help you prepare, combine, and move your data, that will be the first step outside your perimeter you want to safeguard. Even though you hired them, without years of built-up trust, you may not want them to have access to sensitive data. Or it may even be out of your hands—you may have agreements or contracts with your own customers that specify you can’t let any vendor or other entity have access without written consent.  

data tokenization

In this case, “Level 2 Tokenization” is probably the right choice. This takes one step back in the data transfer path and tokenizes sensitive data before it even reaches the ETL. Instead of direct connection to the source database, the ETL provider connects through the data tokenization software which returns tokens. ALTR partners with SaaS-based ETL providers like Matillion to make this seamless for data teams.  

Level 3: End-to-end on-premises-to-cloud data tokenization

If you’re a very large financial institution classified as “critical vendor” by the US government, you’re familiar with the arduous security required. This includes ensuring that ultra-sensitive financial data is exceedingly secure – no unauthorized users, inside or outside the enterprise, can have access to that data, no matter where it is. You already have this nailed down in your on-premises data stores, but we’re living in the 21st century and everyone from marketing to IT operations is saying “you have to go to the cloud.” In this case, you’ll need “Level 3 Tokenization”: full end-to-end data tokenization from all your onsite databases through to your cloud data warehouse.  

data tokenization

As you can imagine, this can be a complex task. It requires tokenization of data across multiple on-premises systems before even starting the data transfer journey. The upside is that it can also shine a light on who’s accessing your data, wherever it is. You’ll quickly hear from people throughout the company who relied on sensitive data to do their jobs when the next time they run a report all they get back is tokens. This turns into a benefit by stopping “dark access” to sensitive data.  

Conclusion

Data tokenization can provide unique data security benefits across your entire path to the cloud. ALTR’s SaaS-based approach to data tokenization-as-a-service means we can cover data wherever it’s located: on-premises, in the cloud or even in other SaaS-based software. This also allows us to deliver innovations like new token formats or new security features more quickly, with no need for users to upgrade. Our tokenization solutions also range from flexible and scalable vaulted tokenization all the way up to PCI Level 1 compliant, allowing companies to choose the best balance of speed, security, and cost for their business. We’ve also invested heavily in IP that enables our database driver to connect transparently and keep data usable while tokenized. The drivers can, for example, perform the lookups and joins needed to keep applications that are unused to tokenization running.

With data tokenization from ALTR, users can bring sensitive data safely into the cloud to get full analytic value from it, while helping meet contractual security requirements or the steepest regulatory challenges.

There’s nothing worse than when you lose the remote to your TV. All you want to do is sit on the couch and change the channel or the volume at your leisure — but when you don’t have a remote you have to get up, walk over to the tv, click the “next channel” button twenty-five times until you get to the channel you want, then walk all the way back to the couch to sit down, exhausted. Oh, then you realize it’s too loud, and now you have to do the whole thing all over again. It’s downright infuriating.  

But if you didn’t know that a remote existed, you probably wouldn’t mind it so much, right? If that’s all you ever had, it would seem normal. This is a good way to think about how ALTR works when it comes to Snowflake Masking Policy. You can do dynamic data masking in Snowflake without us, but it's a heck of a lot easier to do it with us.  

Snowflake Masking Policy

What you do now: write your Snowflake masking policy using SnowSQL

Generally, writing a Snowflake masking policy requires roughly 40 lines of SnowSQL per policy. Depending on your business, that can turn into 4,000 lines real quick. And then you have to test to make sure it works as intended. And then you have to go through QA. And then you have to update it and start the process all over. The process can feel endless. Just like going from channel 12 to channel 209 without a remote, it’s exhausting and tedious.  

If you look at Snowflake’s documentation, you’ll see that creating a Snowflake masking policy requires 5 steps:

1. Grant Snowflake masking policy privileges to custom role

Snowflake Masking Policy

2. Grant the custom role to a user

Snowflake Masking Policy

3. Create a Snowflake masking policy

Snowflake Masking Policy

4. Apply the Snowflake masking policy to a table or view column

Snowflake Masking Policy

5. Query data in Snowflake 

That’s just to get started with a basic Snowflake Masking Policy! If you want to apply different types, like a partial mask, time stamp, UDF, etc. then you’ll need to refer back to the documentation again. To get more advanced with Snowflake tag-based or row-level policy, you’ll need another deep dive.  

The big kicker here is the amount of time it takes to code not only the initial policies, but to update them and test them over time. No matter how good anyone is at SnowSQL, there’s always room for human error that can lead to frustration at best and at worst to dangerous levels of data access.

So, what if you could automate the Snowflake masking policy process? What if you could use a remote to do it for you to save time and keep things streamlined for your business?

What you could be doing: automating Snowflake masking policy with ALTR

Setting a sensitive data masking policy in ALTR is like clicking “2-0-9" on your remote when a commercial comes on channel 12; you log in, head to the Locks tab, and use ALTR’s interface to set a Snowflake masking policy that has already been tested for you. And when something changes in your org, you log back in and update your data masking policy or add a new one with just a few clicks.

Here’s exactly how that works:

1. Navigate to Data Policy --> Locks --> Add new

Snowflake Masking Policy

2. Fill out the lock details: name, user groups affected.

3. Choose which data to apply policy to, then choose the dynamic data masking type you’d like to use (full mask, email, show last four of SSN, no mask, or constant mask).  

               a. Column-based data masking (sensitive columns have been classified and added for ALTR to govern)

               b. Tag-based data masking (tags are defined either by Google DLP, Snowflake classification, Snowflake object tags, or tags imported from a data catalog integration).  

4. (Optional) Add another data masking policy.

5. Click “Add Lock”

That’s it; there’s no code required, and anyone in the business can set up a Snowflake masking policy if they have the right Snowflake permissions. To update or remove a lock, all you have to do is edit the existing policy using the same interface.

ALTR’s data masking in Snowflake policies are not only easy to implement, but they leverage Snowflake’s native capabilities, like Snowflake tag-based masking. That means that ALTR is not only the most cost-effective method, but it ensures that your policy works best with Snowflake.

Check out this video below to see what it looks like to set Snowflake Masking Policy manually versus doing it in ALTR:

SaaS platforms have exploded in the last few years for good reason: they offer unprecedented scalability, cost, accessibility, flexibility. But like any explosion, it left some messes in its wake. For IT and security teams in particular, the increasing number of solutions used by teams throughout the company created a seemingly never-ending need to add users, remove users, or change permissions every time some joined, changed roles, shifted responsibilities, or left the company altogether. As is often the case, IT and security teams took up the slack managing and maintaining user permissions manually – going into each platform, adding each new user, setting permissions and doing it over and over again, each time a change occurred.

This led to delays, risk of error or even users skipping the authorization process altogether. According to Gigaom research, 81% of employees admitted to using unauthorized SaaS applications, and in an IDG report 73% of IT leaders agreed that keeping track of identity and permissions across environments is a primary challenge. If onboarding new employees was painful, off-boarding was even worse. If IT forgot a service, then a past employee could still have access they shouldn’t. Talk about a security issue!

cloud data access

Okta Automates User Account Management

Then in 2009, along came Okta. Built on top of the Amazon Web Services cloud, Okta’s single sign-on service allows users to log into multiple systems using one central process. Okta automatically creates all your user accounts when an employee comes on, then automatically disables or deactivates them when an employee leaves. You can still always go into each service and make changes, but why? Okta is SaaS-based, you can start for free and then it’s just a couple of dollars per user per month after that. Okta also expanded to integrate with other solutions to simplify the overall onboarding process. For example, using ServiceNow when a new employee is hired triggers the building manager to generate a new badge, Okta to generate user accounts, and HR to generate payroll forms.

At a certain point, it became stupid not to use Okta, and today the service has more than 300 million users and 15k+ customers. So that takes care of the first wave of cloud migration: users moving to SaaS platforms. But what about the next migration: data moving to cloud platforms?

Why Shouldn’t We Have Okta for Cloud Data Access Control?

If the Okta model worked for software permission provisioning, why couldn’t something similar be the answer for cloud data access control and security? Setting individual or role-based user data access policies correctly is critical, but perhaps even more critical is the confidence that access is revoked when needed – all automated, all error-free. In addition, Okta’s ease of use allowed it to be utilized by groups outside IT, like marketing and sales teams who were early SaaS adopters. Since data, just like software, is often owned, controlled and migrated by groups outside IT, shouldn’t managing data access and security be just as flexible and user-friendly?

From DIY Cloud Data Access Control to D-I...Why?

Okta’s (and many automated solutions’) biggest early competitor was “do-it-yourself.” If you’ve always been able to handle users and data access control manually, it can seem like making the shift to a new process would just add more work. But it’s a little like the frog in the pot – the temp is rising but you don’t realize you’re boiling until it’s too late. Maybe setting up a new data user took 10 minutes just a year ago, but today you’re dealing with hundreds of requests a week; and something that was a snap to do manually on a small scale is now taking up hours of your time. It’s when you realize that your data projects have moved from minimal viable product/beta stage to full production with hundreds of users across the enterprise, that you may wake up one day and realize you no longer have any time to enable data projects because you’re so busy enabling data users.

cloud data access

ALTR Automates Cloud Data Access Control

Okta is a low lift, SaaS-delivered, zero-up pricing solution that eliminates burdensome manual provisioning of user access to software and integrates with multiple systems to automate the onboarding process. Sound familiar? We believe that ALTR is the “Okta for data.” We massively simplify provisioning data access controls at scale and integrate with the top-to-bottom modern data stack to reduce error and risk and increase efficiency.

And if you don’t think you need it today, just look back at the journey from manual software permissions to Okta. It’s only a matter of time before data access follows the same path. Wouldn’t it be great to get out of the pot BEFORE it’s boiling?

See how easy and scalable automated data access control can be in Snowflake with ALTR. Try ALTR Free!

ALTR CEO James Beecham has compared encryption to duct tape. Duct tape is great - it comes in handy when you need a quick fix for a thousand different things or even...to seal a duct. But when it comes to security, you need powerful tools that are fit for purpose.

Today, let’s compare some different methods you could use to secure data - including tokenization vs encryption - to see which is the best fit for your cloud data security.

Tokenization vs Encryption: 3 Reasons to Choose Tokenization

As a data security company, ALTR uses encryption for some things, but when we looked at encryption vs tokenization, we found tokenization far superior for two key data security needs: 

  • Defeating data thieves
  • Enabling data analysis

Companies that want to transform data into business value need both security and analytics. Tokenization delivers the best of both worlds: the strong at-rest protection of encryption and the analysis opportunity provided by similar solutions like anonymization.

3 ways tokenization is superior to encryption: 

1. Tokenization is more secure.

It actually replaces the original data with a token, so if someone successfully obtains the digital token, they have nothing of value. There’s no key and no relationship to the original data. The actual data remains secure in a separate token vault.

This is important because we now collect all kinds of information as a society. Companies want to analyze the customer data they hold, whether it’s Netflix, a hospital or a bank. If you’re using encryption to protect the data, you must first decrypt it all to make any use of it or any sense of it. And decrypting leads to data risk.

2. Tokenization enables analytics.

Because tokenization offers determinism, which which maintains the same relationship between a token and the source data every time, accurate analytics can be performed on data in the cloud.

If you provide a particular set of inputs, you get the same outputs every time. Deterministic tokens represent a piece of data in an obfuscated way and give you back the same token or representation when you need it. The token can be a mashup of numbers, letters and symbols, just like an encrypted piece of data, but tokens preserve relationships. The real benefit of deterministic tokenization is allowing analysts to connect two datasets or databases securely, protecting PII privacy while allowing analysts to run their data operations.

3. Tokenization maintains the source data.

Because the connection is two way – tokenization and de-tokenization - you can retrieve the original data in the event if you need it.

Let’s say you’ve collected instrument readings from a personal medical device that I own. If you detect something in that data, like performance degradation, you and I both would appreciate my getting a phone call, an email or a letter informing me I need to replace the device. Encryption would not allow this because once data is encrypted, such as my name or phone number, it disappears forever from the database.

Tokenization vs Encryption

Tokenization vs Anonymization: Limited Analytics Today and Tomorrow

Unlike encryption, anonymization offers some ability to perform fundamental analysis, but is limited by the anonymization data design and intent. Anonymization removes all the PII by grouping data into ranges, like age range or zip code while removing their birthdate and street address. This means you can perform a level of analysis on anonymized data, say on your 18 to 25 years old customers. But what if you wanted a different group or associate that age range with another data set?

Anonymization is permanent and inflexible. The process cannot be reversed to re-identify individuals, which might not give you enough options. If your team wants to follow an initial data run to invite a group of customers to an event or send them an offer, you’re stuck without the phone number or mailing address available. There’s no relationship to the original PII of the individual.

Tokenization vs Hashing: A One-Way Trip

Another data security tool is one-way hashing. This is a form of cryptographic security that uses an algorithm to convert source data into an anonymized piece of data of a specific length. Unlike encryption, because the data is a fixed length and the same hash means the same data, it can be operated on with joins. But a big downside is that it’s (virtually) irreversible. So, like anonymization, once the data is converted, it cannot be turned back into plain text or source data for further analysis. Hashing is most often used to protect passwords stored in databases. You may also hear the term “salting” applied to password hashing. This is the practice of adding additional values to the end of the hashed password to differentiate the value, making the password cracking process much harder. Hashing works very well for password protection but is not ideal for PII that needs to be used.

Encryption, anonymization and one-way hashing, therefore, can be shortsighted moves. Your organization’s success depends on allowing authorized users to access the original data now and in the future, as long as you can track and report on the usage. At the same time, you must also ensure that sensitive data is useless to everyone else.  

Tokenization vs Encryption

Tokenization: The Clear Cloud Data Security Winner

When looking at tokenization vs encryption, it's clear that tokenization overcomes the challenges other data security solutions face by preserving the connections and relationships between data columns and sets. However, tokenization isn’t just a simple mathematical scramble of the original data like encryption or a group of ranges with anonymized data. Authorized analysts can query tokenized data for insights without having access to the underlying PII. The more secure token remains meaningless to any unauthorized user or hacker.  

With modern tokenization techniques, you can apply policies and authorize access at scale for thousands of users. You can also track and report on the secure access of sensitive data to ensure compliance with privacy regulations worldwide. You can’t do this with anonymization, hashing or encryption.

When it comes to tokenization vs encryption, tokenization is the more flexible tool for secure access and privacy compliance. This is critical for organizations quickly moving from storing gigabytes to petabytes of data in the cloud. You can feed tokenized data directly from cloud data warehouses like Snowflake into any application. You can do this with complete confidence that all the data, including sensitive PII, will be protected even from the database admin while making it easy for authorized data end-users to collaborate and deliver valuable insight quickly. Isn’t that the whole point?  

See how ALTR can integrate with leading data catalog and ETL solutions to deliver automated tokenization from on-premises to the cloud. Get a demo.

Most of us know that data creation and collection has accelerated over the last few years. Along with that has come an increase in data privacy regulations and the prominence of the idea of “data governance” as something companies should be focused on and concerned with. Let’s see what’s driving the focus on data governance, define what “data governance” actually is, look at some of the challenges, and how companies can implement data governance best practices to build a modern enterprise data governance strategy.  

Data Governance History

The financial services industry was one of the first to face regulations around data privacy. The Gramm–Leach–Bliley Act (GLBA) of 1996 requires all kinds of financial institutions to protect customer data and be transparent about data sharing of customer information. This was followed by the Payment Card Industry Data Security Standard (PCI DSS) in 2006. Then the Financial Industry Regulatory Authority (FINRA), founded in 2007, established rules institutions must follow to protect customer data from breach or theft.  

Perhaps not surprisingly, healthcare was another industry to face early data regulations. The first sensitive data to be covered in the US was private health data – the Health Insurance Portability and Accountability Act of 1996 (HIPAA) required national standards to protect sensitive patient health information from being disclosed without the patient’s consent or knowledge. More recently, data privacy regulations like the European Union’s GDPR and California’s CCPA privacy regulation have expanded coverage to all variety of “personal data” or Personal Identifiable Information (PII). These laws put specific rules around what companies can do with sensitive personal data, how it must be tracked and protected. And US data privacy guidelines have not stopped there – Colorado, Connecticut, Virginia and Utah have all followed their own state-level privacy regulations. So today, just about every company deals with some form of sensitive or regulated data. Hence the search for data governance solutions that can help companies comply.  

What is Data Governance? - a Definition

Google searches for “data governance” have doubled over the last five years, but what is "data governance” really? There are a few different definitions depending on where you look: 

  • The Data Governance Institute defines data governance as “a system of decision rights and accountabilities for information-related processes, executed according to agreed-upon models which describe who can take what actions with what information, and when, under what circumstances, using what methods.”
  • The Data Management Association (DAMA) International says it is “planning, oversight, and control over the management of data and the use of data and data-related sources.”
  • According to the Gartner Glossary, it’s “the specification of decision rights and accountability framework to ensure the appropriate behavior in the valuation, creation, consumption, and control of data and analytics.”  

You could probably find a hundred more data governance definitions, but these are pretty representative. Interestingly, it’s either called a “system” or a “framework,” – which are very process-oriented terms.    

At a high level, “data governance” is about understanding and managing your data. Enterprise data governance projects are often led by data governance teams, security teams or even cross-functional data governance councils who map out a process and assign data stewards to be responsible for various data sets or types of data. They’re often focused on data quality and data flows – both internally and externally.  

As you can see, data governance is not technology. Still, technologies can enable the enterprise data governance model at various stages. And due to increased regulatory pressures, more and more software companies offer “data governance” solutions. Unfortunately, many of these solutions are narrowly focused on the initial steps of the data governance strategy—data discovery, classification or lineage. However, data governance can’t just be about data discovery, cataloguing or metadata management. While many regulations start with the requirement that companies “know” their data, they’ll never be fully in compliance if organizations stop there. In addition, fines and fees are associated with allowing data to be misused or exfiltrated, and the only way to avoid those is by ensuring data is used securely.  

Data Governance Challenges

Companies can run into many data governance challenges – from knowing what data they have to where data is to understanding where the data comes from and if they can trust it or not. You can solve many of these challenges with the various data catalog solutions mentioned above. These data catalogs do a great job at helping companies discover, classify, organize and present a variety of data in a way that makes it understandable to data professionals and potential data users. You can think of the result as a data “card catalog” that provides a lot of context about the data but does not provide the data itself. Some catalog solutions even offer a shopping cart feature that makes it very easy for users to select the data they want to use.  

That leads to the following data governance challenge: controlling access to data to ensure that only the people who should have access to specific data have access to that data.

This goes beyond the scope of most data catalog solutions – it’s like having a shopping cart with no ability to check out and receive your item. Managing these requests is often done manually via SQL or other database code. It can become a time-consuming and error-prone process for DBAs, data architects and data engineers as requests for access to data pile up. This happens very quickly once the data catalog is available – as soon as users within the organization can easily see what data is available, the next step is undoubtedly wanting access to it. In no time, those tasked with making data available to the company spend more time managing users and maintaining policies than they do developing new data projects.  

Data Governance Benefits

While data governance can be a challenging task, there would not be so much focus on it if the benefits didn’t outweigh the effort. With a thoughtful and effective data governance strategy, enterprises can achieve these benefits: 

1. Avoid hefty fines and stringent sanctions on leaked PII

As mentioned above, every company that deals with PII is subject to regulations regarding data handling. In the US, the regulatory landscape is still patchy but targeting the most stringent requirements is the easiest path. A robust data governance practice can ensure companies meet their obligations and avoid fines across all their spheres of operation.  

2. Leverage data-driven decisions for competitive advantage

A key reason there are growing regulations around collecting and using personal and sensitive data is that companies would like to use this data to understand their customers better gain insight into optimization opportunities, and increase their competitive advantages.

In a Splunk survey of data-focused IT and business managers, 60 percent said both the value and amount of data collected by their organizations will continue to increase. Most respondents also rate the data they’re collecting as extremely or very valuable to their organization’s overall success and innovation. In a recent Snowflake survey with the Economist, 87% say that data is the most important competitive differentiator in the business landscape today, and 86% agree that the winners in their industry will be those organizations that can use data to create innovative products and services. A data governance strategy gives companies insight into what data is available to gather insight from, ensures the data is reliable and sets a standard and a practice for maintaining that data in the future, allowing the value of the data to grow.  

3. Improve customer trust and relationships

In a 2019 Pew Research Center study, 81% of Americans said that the potential risks they face because of data collection by companies outweigh the benefits. This might be because 72% say they personally benefit very little or not at all from the data companies gather about them. However, a recent McKinsey survey showed that consumers are more likely to trust companies that only ask for information relevant to the transaction and react quickly to hacks and breaches or actively disclose incidents. Coincidentally, these are some of the requirements of data privacy regulations – only gather the information you need and be upfront, timely and transparent about leaks.

This gives organizations that focus on protecting customer data privacy via a future-focused data governance strategy an opportunity to lead in the market.

What is data governance in healthcare? 

Data governance in healthcare is very focused on complying with federal regulations around keeping personal health information (PHI) private. The US Health Insurance Portability and Accountability Act of 1996 (HIPAA) modernized the flow of healthcare information. It stipulates how personally identifiable information maintained by the healthcare and healthcare insurance industries should be protected from fraud and theft, and addressed some limitations on healthcare insurance coverage. It generally prohibits healthcare providers and healthcare businesses, called covered entities, from disclosing protected information to anyone other than a patient and the patient's authorized representatives without their consent. With limited exceptions, it does not restrict patients from receiving information about themselves. It does not prohibit patients from voluntarily sharing their health information however they choose, nor does it require confidentiality where a patient discloses medical information to family members, friends, or other individuals not a part of a covered entity. Any entity that has access to or holds personal health information on an individual is required to comply with HIPAA.

data governance

Data Governance Best Practices

Today, organizations utilize massive amounts of data across the enterprise to keep up with the pace of innovation and stay ahead of the competition. But making data available to users throughout the business also increases the risk of loss and the potential costs of a breach. It seems like an impossible choice: use data or protect it. But unfortunately, it’s not a choice; organizations must protect data before sharing it.  

This requires a solution that includes these enterprise data governance best practices:  

  • Data discovery, classification and lineage – to ensure regulated data governance, companies must be able to identify, locate and trust it.  
  • Automated data access controls – as the need for data across the business grows, manual granting of access requests becomes infeasible. Manual controls slow down access to data and introduce the possibility of human error, potentially creating compliance issues instead of avoiding them. Role-based access controls are more efficient in ensuring that only authorized users get access to the data they need.  
  • Data usage visibility and tracking – once data has been logged and access granted, there must be visibility into who is using what data, when and how much. This helps companies prepare for an audit while ensuring appropriate data usage. It can also provide valuable insight into normal usage patterns to identify out-of-normal areas for concern more easily
  • Automated policy enforcement - after data access has been granted, there must still be the ability to automatically alert, slow or stop any out-of-policy activity to prevent or halt credentialed access threats.  

In addition, a solution must make the implementation of data governance easy for groups across the company. It’s not just data, security or governance teams responsible for keeping data safe – it’s everyone’s job.  

data governance

Data Governance: the Future

There’s zero chance that data collection, use and regulation will decrease in the coming years. IDC predicts that the global datasphere will double in size from 2022 to 2026. Regulations also show no sign of slowing – a US federal privacy bill was making its way through approvals as of July 2022.

Both of these trends mean that if companies don’t have a data governance strategy in place now, they will soon need to. As a result, the number of data governance solutions will continue to increase rapidly. Some of these will come from legacy players seemingly offering soup to nuts; some from energetic new startups providing a fix for a single task with very little expertise. We expect the industry to move toward an enterprise data governance solution that helps companies meet global privacy requirements while being easy to use, manageable and scalable to keep up with growing data and regulations.  

A data catalog is a tool that puts metadata at your fingertips. Remember libraries? The card catalog puts all the information about a book in a physical or virtual index, such as its author, location, category, size (in pages), and the date published. You can find a similar search tool or index in an online music or video service. The catalog gives you all the essentials about the thing or data, but it is not the data itself. Some catalogs do not provide any measure of protection other than passive alerts and logs. Even basic access controls and data masking can shift the burden to data owners and operators. Coding access controls in a database puts more stress on the DBAs. Solutions requiring copying sensitive data into a proprietary database still expose the original data. These steps also don’t stop credentialed access threats: system admins can still access sensitive customer data. They can accidentally delete the asset. If credentials get lost or stolen, anyone can steal the data or cause other harm to your business. Data classifiers and catalogs are valuable, no doubt about it. But they’re not governance. They can’t fulfill requests for access, track, or constrain them. When it comes to data catalogs and data governance, you must address a broad spectrum of access and security issues, including:

Access:

You can’t give everyone the skeleton key to your valuable data; you must limit access to sensitive data for specific users.

Compliance:

If you cannot track individual data consumption, it will be nearly impossible to maintain an audit trail and share it for compliance.

Automation:

How do you ensure that the policies you set up are implemented correctly? Do you have to hand them off to another team to execute? Or do you have to write and maintain the code-based controls yourself?

Scale:

As data grows in volume and value, you’ll see more demand from users to access it. You must also ensure the governance doesn’t impede efficiency, performance, or the user experience. Controlling access can’t grind everything to a halt.

Protection:

Sensitive data must be secure; it’s the law virtually everywhere. Governance must ensure confidential data receives the maximum security available wherever it is. Companies need visibility into who consumes the data, when, and how much. They must see both baseline activity and out-of-the-norm spikes. And they must take the next crucial step into holistic data security that limits the potential damage of credentialed access threats.  

Data Catalogs and Data Governance: 4 Steps to Control and Protect Sensitive Data

When it’s all said and done, data governance must be easy to implement and scale for companies as part of their responsibility to collect, store, and protect sensitive data. Bridging the gap in security and access can help you comply with applicable regulations worldwide while ensuring protection for the most valuable assets. When it comes to data catalogs and data governance you can follow these four steps to control access and deliver protection over sensitive data:

1. Integrate your data governance tools with an automated policy enforcement engine with patented security.

The data governance solution should provide security that can be hands-free, require no code to implement, and focus on the original data (not a copy) to ensure only the people who should have access do. This means consumption limits and thresholds where abnormal usage triggers an alert to halt access in real-time. Tokenizing the most critical and valuable data prevents theft and misuse. These controls help admins stop insider threats and allow continued access to sensitive data without risking it.

Data Catalogs and Data Governance

2. Set your policies once and automate implementation to reduce manual errors and risk.  

You can eliminate tedious and manual configuration of access policies to save time and ensure consistent enforcement. Automation lets you control access by user role or database row and audit every instance. These policies restrict access and limit what users can see and analyze within the database. The ability to track and report reporting on every model of access makes it easy to comply with regulatory requests.

3. Enable self-service data requests to speed up data access.

Automated access controls let admins provide continued access to sensitive data, apply masking policies, and stop credentialed access threats for thousands of end users without putting the data at risk. Data teams can move at speed required by the business yet be restricted to accessing only the data sets they’re authorized to view. For instance, you can prevent an employee based in France from seeing local data meant only for Germans. You can also avoid commingling data that originated from multiple sources or regions. This allows you to foster collaboration and sharing with greater confidence in security and privacy measures.

4. Scale your data access control and policy enforcement as the use and uses of data grow throughout your business.

The scope of data access requests today within enterprises has reached a level that requires advanced automation. Some enterprises may have scanned and catalogued thousands of databases, even more. Data governance solutions should quickly implement and manage access for thousands of users to match. Features like rate-limiting stipulate the length or amount of access, such as seeing a small sample for a brief period for anyone who isn’t the intended consumer, like the catalog admin—scaling policy thresholds as needed allows you to optimize collaboration while stopping data theft or accidental exposure. You can limit access regardless of the user group size or data set.  

Modern and Simple Data Governance

Modern data organizations are moving to simplify data governance by bringing visibility to their data and seeking to understand what they have. However, data governance doesn’t stop once you catalog your data. That’s like indexing a vast collection of books or songs but letting no one read or listen to the greatest hits. You should grant access to sensitive data but do so efficiently to not interfere with your day job and effectively comply with regulations and policy. Integrating a data catalog with an automated policy enforcement engine is the right strategy. You’ll gain the complete package, with a governance policy that is easy to implement and enforce, access controls that focus on the original sensitive data, and detailed records of every data request and usage. Managing enterprise data governance at scale lets, you use data securely to add value faster, turning the proverbial oil into jet fuel for your organization’s growth.  

If we learned anything at Snowflake Summit (and we did – a lot!) it’s that the data governance space is as confusing as it is frenzied. Nearly every company is at some stage of moving to capitalize on cloud data analytics, while the regulatory environment around data continues to increase the urgency for privacy and security. Every single data governance, control, security session we saw was completely packed, indicating that many companies are now ready to focus on protecting sensitive data. Yet, some of the options in the market are misnamed, confusing and frustrating for buyers. There are many focused providers and even other adjacent software markets like data catalogs are starting to offer basic features. Also, Snowflake itself continues to roll out very powerful, albeit very manual-to-implement, governance features.

We’re hoping to not only clear up some of the FUD around data governance solutions, but also set the bar for how easy, functional and cost-effective data governance can and should be.  

A new paradigm for controlling data access and security at scale

Last week we announced our new policy automation engine which combines governance features like access control and dynamic data masking with security controls like data usage limiting and tokenization, leveraging metadata like Snowflake Object Tagging and implemented and managed without code. With this new data governance solution, we’ve maintained our commitment to cloud-native delivery that supports best-in-category time to value and zero cost of ownership beyond our very reasonable, by user by month subscription.

For ALTR this is the realization of our vision for data privacy and security driven by people who can best accomplish it – the people who know the data – by assembling disparate tools into a single engine and single POV across the enterprise.  

Built on a flexible, secure cloud foundation that leverages and automates Snowflake’s own features

ALTR is a true cloud-native SaaS offering that can be added to Snowflake using Partner Connect or a Snowflake Native App in just minutes. It integrates seamlessly with Snowflake without the need to install and maintain a proxy or other agent. Our microservices infrastructure takes full advantage of the scalability and resilience of the cloud, offering extremely high availability with multi-region support by default, because your data governance solution simply cannot go down.

Importantly, our service is built the same way as Snowflake itself and leverages Snowflake’s native features whenever possible. Those powerful features all involve writing SnowSQL to implement, and we automate them so that you don’t have to scale them yourself and you can go completely no-code.  

We have also been Soc 2 Type 2 and PCI DSS Level 1 certified for years, and we maintain a highly disciplined security culture in our technical teams. Across various data sources we offer multiple integration types, including Cloud-to-Cloud Integration, Smart Database Drivers, and Proxy solutions. They all connect and use the same ALTR cloud service.  

A data governance solution to both scale and safeguard

All of this comes together in an easy-to-use solution that delivers combined data governance and security for thousands of users across on premises and in cloud data storage. Automation of access policy can unlock months of person-hours per year in writing and maintaining policy, in the same way that ETL/ELT providers automating data pipelines saved data teams huge amounts of time in provisioning data.  

In addition, unlike all the other providers in the data governance space, the ALTR solution moves beyond traditional data access policy tools like RBAC and dynamic masking into data security functionality like data usage limits and tokenization. We feel that your policy around data should contemplate your credentialed users and also extend to use cases where credentials might have been compromised or privileged access is an issue. For us, data policy is about both control and protection, and those policies should be developed and enforced with both in mind.

The future is bright – for data governance solution buyers

We’ll continue to extend our solution by deepening our policy engine’s capabilities with new policy types, expanding our support for a greater variety of data sources and data integrations, and building out more seamless integrations with like-minded players in the data ecosystem (such as more ELT and Catalog providers). All of this is driven and directed by a growing community of customers who are innovating in data and showing us where they need us the most. As the technology space moves forward, the options available to those still searching for a data governance solution will come into focus and the best choice will be clear.  

Get the latest from ALTR
Subscribe below to stay up to date with our team, upcoming events, new feature releases, and more.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.