ALTR Blog

The latest trends and best practices related to data governance, protection, and privacy.
BLOG SPOTLIGHT

Format-Preserving Encryption: A Deep Dive into FF3-1 Encryption Algorithm

ALTR’s Format-Preserving Encryption, powered by FF3-1 algorithm and ALTR’s trusted policies, offers a comprehensive solution for securing sensitive data.
Format-Preserving Encryption: A Deep Dive into FF3-1 Encryption Algorithm

Browse All

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Data protection and data privacy have continued to appear on the front page of local and national news throughout the year and as we close the final chapters of 2022. Remote work and scattered teamwork continue for many, pulling IT, governance, data and security teams in disparate directions, often not allowing the capacity to face data privacy and protection issues. We saw that reflected throughout the year in the many trending topics and news headlines.  

Without further ado, we present our 2022 Data Wrapped:  

The 'Next Big Thing' in Cyber Security

15 industry experts from Forbes Technology Council, including ALTR CEO James Beecham, discuss cybersecurity awareness and key items every organization’s leadership team should take note of. Read More

US Federal Data Privacy Law Looks More Likely

While the United States doesn't have a federal data privacy law yet, in May of this year, legislators introduced the Data Privacy and Protection Act “which is a major step forward by Congress in its two-decade effort to develop a national data security and digital privacy framework that would establish new protections for all Americans.” In addition, 5 US States have data privacy legislation going into effect in 2023.  

75% of the World Will Be Covered by Data Privacy Regulations

Whether your company exists in a state with data legislation or not, now is the time to think about protecting your sensitive cloud-based data. Gartner predicts that “by year-end 2024, 75% of the world’s population will have its personal data covered under modern privacy regulations.” Europe led the charge in formalizing modern privacy laws with the GDPR, a bill that passed in 2018 regulating the handling of sensitive data. And while the United States is still catching up on state-by-state laws, Gartner believes that due to the COVID-19 pandemic and rising cases of data breaches, security and risk management (SRM) will only gain in prevalence as we move into the new year.  

Data Breaches Continue to Make Headlines in 2022

Unfortunately, data theft continued to be a prevalent issue in 2022, with Apple, Meta, Twitter, and Uber being among the list of companies who suffered significant data breaches. We're seeing data breaches this year, even more than years past, impact companies across all sectors and sizes.  

In September of 2022, Uber's computer network suffered a "total compromise" due to a hacker wrongfully gaining access to their data. Email, cloud storage and code repository data were all breached. The hacker, an 18-year-old, told The New York Times that Uber was an easy target for him "because the company had weak security." Read more here.

ALTR Maintained Market Momentum Throughout 2022

In March 2022, Q2 Holdings, Inc a leading provider of digital transformation solutions for banking and lending, and ALTR announced the long-term extension and expansion of their strategic technology partnership through 2026 to deliver unrivaled data governance and security to Q2 financial institution customers. Learn more here.

In June, ALTR announced the expansion of its partnership with Snowflake with the release of its new policy automation engine for managing data access controls in Snowflake and beyond. This solution allowed data engineers and architects to set up data access policies in minutes, manage ongoing updates to data permissions, and handle data access requests through ALTR’s own no-code platform for data policy management. 

And in October, ALTR Co-founder and Chief Technology Officer, James Beecham, became ALTR’s newest Chief Executive Officer. Beecham leverages his technical acumen and passion for the industry and the business to lead the company's next phase of accelerated expansion. ALTR also appointed Co-founder and Vice President Engineering, Chris Struttmann, to the Chief Technology Officer position. Previous CEO David Sikora remains actively involved with ALTR as a Board Director, CEO Advisor and financial investor.

Looking Forward to 2023

We anticipate 2023 will continue to prove the urgency of focusing on the protection of your sensitive data, making now the time to create your action plan to protect your data. ALTR's low up-front-cost, no-code data governance solution is a great place to begin. Are you ready to take the next step in controlling access to your sensitive data? We can't wait to show you how ALTR can help.  

Get a Demo  

Determining how to handle home security to protect your family is critical. After all, you don’t want to take risks when it comes to their safety. It’s an easy thing to put a lock on one door. But what about every door in the house? Every window? What if you need to handle security for the whole neighborhood? That’s when manual and DIY become unmanageable.  

Snowflake database admins and data owners can run into the same issue with Snowflake Row-Level Security. While it may seem a simple task to set up one row access policy for one database using SQL in Snowflake, it quickly becomes overwhelming when you have hundreds of new users requesting access each week or thousands of rows of new data coming into your system daily.

In this blog post, we’ll explain what Snowflake Row-Level Security is and:

  • Lay out the steps to set up Snowflake Row-Level Security policies manually using SQL vs setting up and managing these policies, with no code, in ALTR
  • Provide examples of row-access policy use cases
  • Show how using ALTR’s Row Access Policy feature can help minimize errors and make managing Snowflake row level security easier for anyone responsible for data security.

What is Snowflake Row-Level Security?

Snowflake’s row-level security allows you to hide or show individual rows in an SQL data table based on the user's role. This level of security gives you greater control of who you’re permitting to access sensitive data. For example, you may want to prevent personally identifiable data (PII) held in rows in a customer table from being visible to your call center agents based on the customers’ address. By using our ALTR Row Access Policy feature you will save:

  • overhead costs from having to hire multiple developers to handle the work,
  • developer time to manually write code, and
  • effort to make configurations correctly when you need to restrict access to individual rows within a table.

How Creating Snowflake Row-Level Security Policies Works if you DIY

What’s involved to create a row-level security policy in Snowflake:

  • Who can do it: A software developer who knows how to manually write SQL code
  • Length of time to complete successfully: Hours or even days each week because the developer will have to manually do-it-yourself (DIY)

Each of the steps below requires code reviews, QA, validation, and maintenance that must be done. These tasks can cause this to take weeks to complete for each unique row access policy.

1. Grant the custom role to a user.

2. Write some code to determine if a table already has a row access policy defined.

Snowflake row level security

3. Write some code to get the original row policy code if it was already defined.

Snowflake row level security

4. Edit the code (or write new code) to implement the row access policy.

Snowflake row level security

Step 4 is what will require most of your time because of everything that’s involved. For example, identifying all the criteria that could give a user access to a role, getting all department stakeholders to approve, turning those conditions into code, and having someone else to review that code and test it are all tasks to complete.

In addition, you’ll also need to make edits based on the code reviews and tests and constantly update the code each time the criteria changes.

How Creating Snowflake Row-Level Security Policies Works in ALTR

What’s involved to create a row access policy in ALTR:

  • Who can do it: Anyone
  • Length of time to complete successfully: Minutes because ALTR requires no code to implement and automates the security process

1. On the Row Access Policy page of our UI, select Add New.

This will allow you to specify the table that the Row Access Policy will apply to and the reference column that will control access.

Snowflake row level security
ALTR Row Access Policy page

2. Indicate which Snowflake roles can access certain rows based on the values in a column. To do this, specify the mappings between User Groups (Snowflake Roles) and column values.

Snowflake row level security
Table and Reference column to apply the access policy to

3. Review your policy, give it a name, click Submit, and you’re done. The name will be displayed in ALTR to reference the Row Access Policy. ALTR will convert the Row Access Policy into Snowflake. In just a few seconds, ALTR will insert the active policy into Snowflake!

Snowflake Row-Level Security Use Cases

Here are a couple of example use cases where our Row Access Policy feature in ALTR can benefit your business as it scales with your Snowflake usage.

USE CASE 1. Using Row-Level Policies to Enable Regional Sales Views

You have sales data that includes records for sales in all your sales regions. You only want your sales managers to see the data for the regions that they manage.  

USE CASE 2. Using Row-Level Policies to Enable Separate Data Sets

You run a SaaS business and your customers want a data set for report of their transactions in your product; however, all the transactions are in a single table — the SaaS way.

What you could be doing: Automating your Snowflake Row-Level Security Policies with ALTR

Do you or your team have hours in a day to spend manually writing SQL code every time you need to create a unique row access policy for hundreds or thousands of users? Do you want to have to increase overhead by hiring multiple developers to manually create row access policies and manage them? Do you want to have to spend hours trying to figure out why a Snowflake row-access policy is not working correctly and you’re getting error messages?

While you can still choose to go down the SnowSQL do-it-yourself route, why not work smart instead of hard? Why risk data breaches and regulatory fines? Safeguard your data to make sure that only the right people have the right access.

By now, you have a better understanding of how using ALTR’s no-code platform enables users who don’t need to know SQL to create and manage Snowflake row level security through a simple point-and-click UI

Watch the “how-to” comparison video below to see  manually setting up your own Snowflake Row Access Security Policy versus doing it with ALTR.

Ready to give it a try? Start with our Free Plan today

Cloud data migration is an inevitable part of every organization’s digital transformation journey. While a big data migration project can seem like an intimidating process, it doesn’t have to be. In fact, with the right preparation and implementation strategy, your organization can use your cloud migration as an opportunity to streamline internal processes, improve data security, reduce costs and gain insights.

To help you avoid some common pitfalls in your cloud data migration, we’ve put together this comprehensive guide with everything you need to know about moving data to the cloud. This post covers everything from why you should migrate to the cloud to what types of data you should migrate and how to do it securely. Let’s get started!

What is Cloud Data Migration?

Cloud data migration is the process of migrating data from on-premises systems to cloud-based systems. When migrating data to the cloud, it’s important to keep in mind that not all data is created equal. There are different types of data that each have unique needs when it comes to migration. While some data types can be migrated easily, others require a more careful approach that takes special considerations into account.  

Why Tackle a Cloud Data Migration?

Cloud data migration is often a key step on the journey towards becoming a data-driven organization. Cloud data migration provides organizations with the opportunity to re-evaluate how they use data and make improvements to their data management processes. As part of your digital transformation journey, cloud migration allows you to transform data into a strategic asset by creating a centralized access point for all your organization’s data. This means data can be more easily retrieved, managed, and integrated across the enterprise. Moving data to the cloud not only provides access to more scalable computing resources than you may have on-premises, it also gives you admittance to a wide range of software-as-a-service (SaaS) apps that you can use to collaborate, process data, and collect insights. This access to a variety of business applications through a single user interface allows organizations to seamlessly integrate various functions and workflows within the organization.

How to Know Which Data to Include in Your Cloud Data Migration?

The best way to decide which data to migrate to the cloud is to start with your business objectives. Once you know what you want to achieve with your cloud data migration, you can start deciding which data to move. There are a few common objectives that most organizations have when it comes to data. These include:

  • increasing employee productivity,
  • improving customer experience, and
  • boosting operational efficiency.

You should also consider moving data that is used frequently and is accessed by various departments. If a data source is critical to business operations, it should be migrated. This includes data such as employee data, customer data, and device data.

Migrating Device and Sensor Data to the Cloud

Moving device and sensor data to the cloud makes it easier to collect and analyze this type of data. This data can be collected from a wide variety of sources, including IoT devices and sensors. Moving device and sensor data to the cloud will allow you to store this data in a central location. Moving device and sensor data to the cloud will also make it easier to integrate this data with other systems such as data analytics tools and CRM systems like Salesforce. Doing so will help you generate more insightful business insights and make more strategic decisions.  

Migrating Customer Data to the Cloud

Moving customer data to the cloud will give you access to a wide variety of customer data analytics tools. This will allow you to better understand your customer base and make strategic business decisions based on customer insights. Moving customer data to the cloud will give you access to data management tools that allow you to collect, organize, and analyze information such as customer data, purchase history, and account information. This will help you make more strategic business decisions, provide better customer service, and identify new business opportunities. Moving customer data to the cloud can also help you comply with data privacy regulations, including the GDPR.

Migrating Business-Critical Data to the Cloud

When deciding which data to migrate to the cloud, you should consider moving data that is most relevant to your business. Moving business-critical data to the cloud will give you access to more computing resources than what you may have on-premises and will allow you to scale up your data processing when needed. Moving business-critical data to the cloud will also deliver access a wide range of data analytics tools such as Tableau, Thoughtspot or Looker that will help you generate more helpful business insights.  

Migrating Employee Data to the Cloud

Moving employee data to the cloud will give you access to cloud-based HR tools that can help you manage key business functions such as hiring, onboarding, and payroll. Moving data such as employee contact information, payroll data, internal and external communications, and customer information to the cloud can improve collaboration across departments by enabling real-time access to information. This access can be particularly beneficial for customer-facing teams such as sales and customer service.

cloud data migration

Migrating Sensitive Data to the Cloud Securely

Employee data, device and sensor data, customer data and business critical data can all comprise sensitive information that is either regulated like personally identifiable information (PII) or simply extremely valuable to the company like intellectual property or payroll information. When moving that data away from company owned on-premises systems and data centers, controlling access and ensuring security throughout the journey is required. There are several places along the cloud data migration path that a cloud data security tool like tokenization can be implemented – from the on-prem warehouse to the cloud, as soon as it leaves the on-prem warehouse but before its handed off to an ETL, or as it enters the cloud data warehouse. The right process will often depend on how sensitive your data is and how regulated your industry is.  

Cloud Data Migration Ecosystem

Part of moving data to the cloud is choosing the right tools for the migration itself (ETL), where to store and share the data (Cloud Data Warehouses), and how to analyze the data (Business Intelligence tools). Today's modern data ecosystem solutions are gathered and reviewed for you here. Your goal should be to build a modern data ecosystem tool stack that integrates easily and works together to deliver the data sharing and analytic goals of your cloud data migration project.

Wrapping up

If your organization has been using traditional systems, a cloud data migration may seem like an overwhelming process. To make it easier, first start by identifying the data types you want to migrate to the cloud. Moving customer data to the cloud, for example, will give you access to better customer analytics tools. Moving employee data to the cloud, on the other hand, will allow you to manage key business functions such as hiring, onboarding, and payroll. Knowing which data to move to the cloud is the first step towards successfully migrating to the cloud. Once you know what to migrate, and the security measures your data requires, you can start the process of planning and implementing your cloud data migration.

In the last year, we’ve seen the awareness of the need for data access control and security in cloud data warehouses pass an inflection point. Most companies we talk to now, especially in the FinServ and Pharma industries, know they must have it. We don’t have to convince them sensitive data needs to be protected in the cloud or show them stats about data breaches or regulatory fines. They get it. But how they decide to get to it is a different story. Some decide to go down the do-it-yourself or build-it-yourself route, but I’m here to explain why you shouldn’t.  

Automation Greases the Wheels 

Identity providers like Okta and Active Directory have done a great job of enabling companies to automatically generate as many users and roles in Snowflake as needed. Today admins can go from 0 users to 1000 in about an hour or two.  

On the other side of the equation, ETL providers like Matillion, FiveTran and Talend have made it easy for companies to transport their data into Snowflake. In an hour or two, admins can move gigabytes or even terabytes of source data and have it ready and waiting for users to access.  

These two forces come to a head at the intersection between them: connecting users with data and defining the relationships between them. How do you make sure only the right users have access to only the data they should have?  

Enter BIY Data Access Controls 

Many companies start with DIY or do-it-yourself: the trusty Snowflake admin or DBA decides to write a handful of SnowSQL Snowflake data access control policies, one at a time. This works when you have one or two new users a week requesting access. But chances are, if you’re using an identity provider to create your profiles, you’re already dealing in hundreds or even thousands of users. DIY just doesn’t cut it – doing that work can suck up hours or even days each week, bringing access for new users as well as any other data projects to a halt, not to mention the human errors that can be introduced. It simply won’t scale.   

Okay, so then our ingenious database admin thinks, “I can BIY this” or build-it-yourself. “I have a tool that puts my users in automatically. And I have a tool that puts my data in automatically. I can fix this problem if I just spend the next week writing a tool that automatically connects these two domains together. Easy-peasy.”  

But wait, let’s take a step back and think about this. Snowflake also gives admins a way to add users without an identity management tool and add their data without an ETL tool. So, what’s the advantage of using an Okta or Matillion? The answer is reliability, scale and automation – those software vendors have built solutions that save you time and just do it better. 

Risk of Crossing the Streams – User x Data  

It’s ironic that of the tools they could create on their own, some companies focus on connecting users with data. Obviously, they’re doing this because they haven’t yet found the Okta or Matillion to handle this. But the irony is that this is the most dangerous spot in the process – that intersection is actually where all the risks are.  

You can add data to Snowflake, and it’s pretty safe when users can’t get to it. And you can add users to Snowflake, but they can’t do much without access to data. Very rarely do you get in trouble for adding a wrong user or the wrong data. If users aren’t connected to the data, the risk is near zero. It’s in the middle part where the streams cross that is fraught with risk. Connecting the wrong user with the wrong data can be very bad for a data engineer, data steward, or privacy owner.  

You could BIY, but Are You Enterprise-Ready?  

So, an admin can write a quick and dirty Snowflake masking policy, but can others read and work with it? Do you have a QA team to eliminate errors? Once you get a proof-of-concept to work on one or two databases, can you ensure it scales correctly and can run quickly across thousands? Do you have the time to integrate it with Okta or Matillion or Splunk? Do you have a roadmap that ensures it’s staying in sync with new private-preview Snowflake features, keeping up with your changing data and regulatory landscape, and addressing new user service needs? Can you ensure it actually works correctly – did you build in feedback and alerting on fails and errors?  

In other words, do you want to hire 30 engineers and spend millions of dollars to build enterprise-ready software you can trust with the risky connection between users and data?   

Automated Snowflake Data Access Control for the Win 

Wouldn’t it just be easier to grab a third leg of your stool for data access controls to go with your user role and data transfer solutions? That’s where ALTR comes in. We’ve already invested the time and resources to build a world-class, reliable solution that automates and enforces the connection between users and data. It leverages all of Snowflake’s native data governance features while adding a no-code layer that makes it easy to apply and manage. It also shows you how users are accessing data to be confident that data is shared correctly. And because it’s SaaS, it’s fast to implement, starts at a low cost and can scale with your Snowflake usage – to hundreds of users and thousands of databases. (You could even think of it as Okta for Data.)  

Want to try it today? Sign up for our Free Plan. Or get a Demo to see how ALTR’s enterprise-ready solution can handle data access control for you. And avoid the BIY headache before it starts.  

Today Christian Kleinerman (SVP Product, Snowflake) grouped his keynote announcements and discussions into three broad areas: core platform, data cloud content, and building on Snowflake.  

There was certainly a bit of excitement in the air as it relates to partners at the keynote. Christian continued to emphasize partners, and he started off by repeating a thought from the other Data Cloud World Tours which is that Snowflake is one product. I think this message needs to remain in front of the partner mind: Snowflake will continue to encourage partners to invest in the single platform and will be unsupportive of any partner who wants to create a non-unified experience with Snowflake.

Snowflake Core Platform

Under the core platform pillar some of the big announcements centered on cross-cloud availability and replication which can make Snowflake a much safer platform to run your business on, especially at global scale. One big shout out for partners was the announcement around data listing analytics. If you are a data provider partner for Snowflake, this is a big win for you. Understanding how consumers are using your data listings will ensure you can make the best decisions as you look to add or remove data sets.

For the governance and security partners, these cross-cloud replication and failovers might cause some issues depending upon how your solution is implemented. For users of tokenization or external functions in general, we know there might still be some need for Snowflake to continue to invest in this area so governance and security features can also seamlessly failover.

Snowflake Data Cloud Content

The second pillar around data cloud content was focused on producing applications and workloads on Snowflake. Partners like EY, phData and Infosys were specifically called out. And it was clear these partners were in the crowd as there was some unexpected cheering! If you are not thinking about building an application natively inside of Snowflake this part of the talk would have you reconsidering. A new partner/ecosystem new startup called Samooha was brought on stage. In under 6 months, the company went from mockup to full working product to help build clean rooms within Snowflake. They noted it was still really new but showcases how quickly you can build an MVP and bring a value-added process to market directly in Snowflake.

Building on Snowflake

Snowpark for Python is now GA! This was the biggest announcement from the last segment around building on snowflake. They actually produced fake snow in the room to make it a true ‘snowday’! Partners can now have a secure single place in Snowflake to share data and run python models directly on data inside of snowflake. This is huge for partners. Christian noted a 6x uptick in usage since the initial announcement at Snowflake Summit ‘22.

Streamlit can be run directly in Snowflake, which will make building and selling an application inside of Snowflake much easier, was a big deal as well. This will make it much easier for users to consume these applications as data will no longer need to leave Snowflake.

It was great to see Sri Chintala on the stage. I remember two years ago when we were first looking at becoming a Snowflake partner, Sri was one of the first product folks we talked with. At the time she was leading the external function group which ALTR utilized heavily. Now she is working with python use cases and got the chance to demo her latest work on stage. It’s amazing to see the product folks mature and with that maturity continue to bring partners, like Anaconda who Sri mentioned on stage, along with them. She also did a good job handling those microphone issues!

All in all, it was another exciting Snowday, and ALTR is looking forward to the next six months as a Snowflake partner. I’ll be posting some thoughts on what I hear throughout the day on LinkedIn. Catch me there until next time!

What is the Modern Data Ecosystem?

Today’s business environment is awash with data. From product development intellectual property (IP) to customer personally identifiable information (PII) to logistics and supply chain information, data is coming at us from all directions. And that data is making its way throughout the business in ways that it never did before.

In the past, your customer and prospect data may have stayed securely behind a firewall in a customer database in a company-owned datacenter. But from the moment Salesforce launched its pioneering Software-as-a-Service CRM, that data has been moving into the cloud. And the volume has only increased. Now, cloud data platforms like Snowflake and Amazon Redshift offer anyone the ability to host and analyze data with just a credit card and a spreadsheet. This has opened a pandora’s box of data analysis possibilities that comes with attendant challenges and risks.

By now most companies understand the significant opportunities presented by living in the “Age of Data.” Recently, a data ecosystem of technologies has developed to help organizations take advantage of these new opportunities. In fact, so many new tools, solutions and technologies have appeared that choosing solutions for a modern data ecosystem can be almost as difficult as dealing with data itself.

We put together this guide to help clear the clutter and explain who does what in the modern data ecosystem and how it can help your organization become more data-driven more quickly.

Data Ecosystem

Your Data Ecosystem Guide

Data Discovery, Classification, and Catalogs

The rapid growth of data collection, security threats, and regulatory requirements has transformed what was previously an esoteric process conversation into a mainstream business challenge. It’s now a strategic priority for any organization to apply and enforce data governance standards, not just the traditional regulated industries like finance and healthcare. However, data owners must tread carefully to avoid running up against privacy laws like GDPR and CCPA: Gartner believes that modern privacy regulations will cover 75% of the world in a couple of years.

Many vendors focus on “knowing” your data—where it is (discovery), what is it (classification), where it came from (data lineage). Industry analysts call this “metadata management,” or getting a handle on the data itself. Data discovery, classification and cataloging are the critical first steps of a big data ecosystem.

Alation

Alation is credited with creating the data catalog product category – an early building block of the modern data ecosystem. Its signature software, the Alation Data Catalog, serves enterprises in organizing and consolidating their data. Alation’s enterprise data catalog dramatically improves the productivity of analysts, increases the accuracy of analytics, and drives confident data-driven decision making while empowering everyone in your organization to find, understand, and govern data.

BigID

BigID offers software for managing sensitive and private data, completely rethinking data discovery and intelligence for the privacy era. BigID was the first company to deliver enterprises the technology to know their data to the level of detail, context and coverage they would need to meet core data privacy protection requirements. BigID’s data intelligence platform enables organizations to take action for privacy, protection, and perspective. Organizations can deploy BigID to proactively discover, manage, protect, and get more value from their regulated, sensitive, and personal data across their data landscape.

Collibra

Collibra calls itself “The Data Intelligence Company.” They aim to remove the complexity of data management to give you the perfect balance between powerful analytics and ease of use. The company’s premier offering is its data catalog – a single solution for teams to easily discover and access reliable data. It allows companies to provide users access to trusted data across all your data sources. Delivering this end-to-end visibility starts with your data catalog, and Collibra gets you up and running in days. With Collibra’s scalable platform, you can future-proof your investment, no matter where business takes you next.

Cloud Data Warehouses

While the cloud migration started with specific workloads moving to SaaS services (think Salesforce or Office 365), today the data ecosystem is focused on, well, data. The same advantages of SaaS – low up-front costs, no hardware to maintain, no datacenter to staff and service, no upgrades to track – all apply to the modern cloud data warehouse. In addition, data storage combined with compute enables companies to consolidate data from across the company and make it easily available for analysis and insight. Data-driven companies find this service invaluable.

Snowflake Data Cloud

Snowflake offers a cloud-based data storage and analytics service that allows users to store and analyze data using cloud-based hardware and software. Snowflake’s founders engineered Snowflake to power the Data Cloud, where thousands of organizations have smooth access to explore, share, and unlock the full value of their data. Today, 1300 Snowflake customers have more than 250PB of data managed by the Data Cloud, with more than 515 million data workloads that run each day.

Amazon Redshift

According to the company, tens of thousands of companies rely on Amazon Redshift to analyze exabytes of data with complex analytical queries, making it the most widely used cloud data warehouse. Users can run and scale analytics in seconds on all their data without having to manage a data warehouse infrastructure. Amazon Redshift uses SQL to analyze structured and semi-structured data across data warehouses, operational databases, and data lakes. With AWS-designed hardware and machine learning, the service can deliver the best price performance at any scale. The company also offers a Free Tier.  

Databricks

The Databricks Lakehouse Platform combines the best elements of data lakes and data warehouses to deliver the reliability, strong governance and performance of data warehouses with the openness, flexibility and machine learning support of data lakes.

This unified approach simplifies your modern data stack by eliminating the data silos that traditionally separate and complicate data engineering, analytics, BI, data science and machine learning. It’s built on open source and open standards to maximize flexibility. And, its common approach to data management, security and governance helps you operate more efficiently and innovate faster.

ETL and ELT Providers

Another significant piece of the data ecosystem puzzle are ETL and ELT providers. Consolidating business data in cloud data warehouses like Snowflake is a smart move that can open up new doors of innovation and value. All your data in one place makes it easier to connect the dots in ways that were impossible or unimaginable before. For instance, a retail chain can optimize sales projections by analyzing weather patterns, or a logistics company can more accurately predict costs by accounting for the salaries of all the people involved in a shipment.

Getting to those insights is a process that starts with moving the data. An extract, transform, and load (ETL) migration technology partner simplifies moving or loading the data from each of your company’s locations into a cloud data warehouse to make it analytics-ready in no time. Moving data is what these companies do best.

Matillion

Matillion’s complete data integration and transformation solution is purpose-built for the cloud and cloud data warehouses. The company’s flagship tool, Matillion ETL, is specifically for cloud database platforms including Amazon Redshift, Google BigQuery, Snowflake and Azure Synapse. It is a modern, browser-based UI, with powerful, push-down ETL/ELT functionality. Matillion ETL pushes down data transformations to your data warehouse and process millions of rows in seconds, with real-time feedback. The browser-based environment includes collaboration, version control, full-featured graphical job development, and more than 20 data read, write, join, and transform components. Users can launch and be developing ETL jobs within minutes. Matillion offers a free trial.  

Fivetran

Focused on automated data integration, Fivetran delivers ready-to-use connectors that automatically adapt as schemas and APIs change, ensuring consistent, reliable access to data. In fact, the company says it offers the industry’s best selection of fully managed connectors. Their pipelines automatically and continuously update, freeing users up to focus on game-changing insights instead of ETL. They improve the accuracy of data-driven decisions by continuously synchronizing data from source applications to any destination, allowing analysts to work with the freshest possible data. To accelerate analytics, Fivetran automates in-warehouse transformations and programmatically manages ready-to-query schemas. Fivetran offers a free trial.  

Talend

According to Talend integrating your data doesn't have to be complicated or expensive. Talend Cloud Integration Platform simplifies your ETL or ELT process, so your team can focus on other priorities. With over 900 components, you can move data from virtually any source to your data warehouse more quickly and efficiently than by hand-coding alone. Talent helps reduce spend, accelerate time to value, and deliver data you can trust.

You can download a free trial of Talend Cloud Integration.

Business Intelligence (BI) and Analytics Tools

Most business data users aren’t running database queries but accessing data and gaining insights via business intelligence tools (BI) that provide services including reporting, online analytical processing, analytics, dashboard , data mining,  complex event processing, business performance management, benchmarking, text mining, predictive analytics, and prescriptive analytics. As the front door to data for technical and line-of-business users throughout the company, finding a friendly, flexible, accessible BI solution is key.

Tableau

Tableau is an interactive data visualization software company focused on business intelligence. Tableau products query relational databases, online analytical processing cubes, cloud databases, and spreadsheets to generate graph-type data visualizations. The software can also extract, store, and retrieve data from an in-memory data engine. Tableau allows organizations to ensure the responsible use of data and drive better business outcomes with fully-integrated data management and governance, visual analytics and data storytelling, and collaboration—all with Salesforce’s industry-leading Einstein built right in. Companies can lower the barrier to entry for users to engage and interact by building visualizations with drag and drop, employing AI-driven statistical modeling with a few clicks, and asking questions using natural language. Tableau provides efficiencies of scale to streamline governance, security, compliance, maintenance, and support with solutions for the entire lifecycle as the trusted environment for your data and analytics—from connection, preparation, and exploration to insights, decision-making, and action.  

ThoughtSpot

ThoughtSpot believes the world would be a better place if everyone had quicker, easier access to facts. Their search and AI-driven analytics platform makes it simple for anyone across the organization to ask and answer questions with data. It empowers colleagues, partners, and customers to turn data into actionable insights via the ThoughtSpot application, embedding insights into apps like Salesforce and Slack, or building entirely new data products. The consumer-grade search and AI technology delivers true self-service analytics that anyone can use, while the developer-friendly platform ThoughtSpot Everywhere makes it easy to build interactive data apps that integrate with users’ existing cloud ecosystem.

Looker

Looker Data & Analytics is business intelligence software and big data analytics platform that helps users explore, analyze and share real-time business analytics easily. Now part of Google Cloud, it offers a wide variety of tools for relational database work, business intelligence, and other related services. Looker utilizes a simple modeling language called LookML that lets data teams define the relationships in their database so business users can explore, save, and download data with only a basic understanding of SQL.[2] The product was the first commercially available business intelligence platform built for and aimed at scalable or massively parallel relational database management systems like Amazon Redshift, Google BigQuery and more.

Data Access Control and Data Security

ALTR is the only automated data access control and security solution that allows organizations to easily govern and protect sensitive data – enabling users to distribute more data to more end users more securely, more quickly. Hundreds of companies and thousands of users leverage ALTR’s platform to gain unparalleled visibility into data usage, automate data access controls and policy enforcement, and secure data with patented rate-limiting and tokenization-as-a-service. ALTR’s partner data ecosystem integrations with data catalogs, ETL, cloud data warehouses and BI services enable scalable on-premises-to-cloud protection. Our free integration with Snowflake allows admins to get started in minutes instead of months and scale up as you expand your data use, user base and databases.

The Evolving Data Ecosystem

ALTR continues to develop relationships with cloud data leaders across the industry. Our goal is to help our customers to get the most from their data by enabling a secure cloud data ecosystem that allows users to safely share and analyze sensitive data. Our scalable cloud platform acts as the foundation by enabling seamless integration with a wide variety of enterprise tools used to ingest, transform, store, govern, secure, and analyze data. ALTR has expanded how we interact with data ecosystem leaders via open-source integrations that allow users to freely and easily extend ALTR's data control and security to data catalogs like Alation and ETL tools like Matillion. Building a modern data ecosystem stack will set you firmly on the path to secure data-driven leadership.

If we’ve learned anything over the last few years, it’s that this data space moves faster than you can imagine. Whether it’s new investments from market leaders, new acquisitions, new partnerships, or new technologies, the landscape is always changing, and those who aren’t ready for the next big shift are quickly left behind.  

James Beecham
James Beecham and Dave Sikora at BlackHat 2022

We anticipated this when we built the ALTR platform from the cloud up to be highly adaptable – our solution can easily scale up or scale down with users, with data, with cloud data warehouse usage. While our competitors were offering legacy on-prem solutions with high barriers to entry like long term commitments, massive up-front costs and complicated implementations, ALTR built a cloud-native, SaaS-based integration for Snowflake that users could add directly from Snowflake Partner Connect and a free plan that lets companies try our solution before ever paying a cent. Our decisions have paid off in market response, demonstrated by compounded annual revenue growth of over 300% since 2018 and an accelerating customer base of over 200 companies.

We couldn’t be more ready for the next phase in ALTR’s journey and it’s the perfect time to appoint a new leader to take it on: James Beecham, ALTR’s Co-founder and Chief Technology Officer has been promoted to become ALTR’s next Chief Executive Officer. As a Co-founder, James was key to identifying the data security hole ALTR could fill. As CTO, he has been the technical leader who envisioned how ALTR could best meet our customers’ needs and one of the most public faces of the company.  

James is excited to chart the course for ALTR’s future, maintaining the company’s trajectory by ensuring we continue to anticipate, act proactively, and deliver the disruptive data governance and security solutions our customers and the market didn’t even realize were possible. We to believe that ALTR’s short “time-to-value" in a market that is fraught with complexity will deliver sustaining differentiation in the coming years.  

And we’re a team here at ALTR so Dave isn’t going anywhere. He and James will work closely together during a transition period, and he will remain involved as a Board Director, CEO Advisor and ongoing financial Investor. Dave will also use this opportunity to expand his strategic advisory practice, mentor up-and-coming CEOs and explore other Board of Director opportunities.  

Please don’t hesitate reach out to James, Dave or your Account Executive if you have any questions about the transition. And stay tuned for great things ahead…

- Dave & James

If there’s one phrase we heard over and over again at Snowflake Summit 2022 (other than “data governance”) it was "data mesh." What is data mesh, you ask? Good question!

Data Mesh definition

Data mesh is a decentralized data architecture to make data available through distributed ownership of data. Various teams own, manage and share data as a product or service they offer to other groups inside the company or without. The idea is that distributing ownership of data (versus centralizing it in a data warehouse or data lake with a single owner, for example) makes it more easily accessible to those who need it, regardless of where the data is stored.

Data Mesh

You can imagine why this might be a hot topic in the data ecosystem. Companies are constantly looking for ways to make more data available to more users more quickly. The data mesh conversation has continued in data ecosystem leader blogs we’ve gathered in our Q3 roundup.

Alation: Fabrics, Meshes & Stacks, oh my! Q&A with Sanjeev Mohan

VP Product Marketing and Analyst Relations at Alation, Mitesh Shah, interviews former Gartner Analyst Sanjeev Mohan in this Q&A-style blog. Mohan shares his definitions of data mesh, data fabric and the modern data stack and why they’re such hot topics at the moment. Mohan suggests the possibility that new terms (like data mesh) are actually history repeating itself, dives into what these new strategies and architectures bring to the table for data-first companies and identifies the pros and cons of centralizing or decentralizing data and metadata.  

Collibra: Data Observability Amidst Data Mesh

Eric Gerstner, Data Quality Principal, Collibra leverages his background as a former Chief Product Owner managing technology for digital transformation to dive into the data mesh concept. He explains that “No amount of technology can solve for good programmatics around the people and process.” He sees data mesh as a conceptual way of tying technology to people and processes and enabling an organization to improve its data governance. This article helps to shed light on the narrative of data mesh and how it fits into modern data organizations in both the immediate and further-out futures. He sees data mesh as key to linking people and processes – people that know how to interpret and organize data and the processes that drive and collect data into the organization itself.

Matillion: Data Mesh with Matillion

This blog by Matillion really unpacks the concept of data mesh at a fundamental level. It’s really about bringing data out from its usual role as a supporting player and elevating it to a product in and of itself. It’s about “productizing” data and offering it to customers within and without the company. Customers have an expectation of the quality of the product and the service they are utilizing. A data mesh can help data owners meet those expectations. Furthermore, this blog explains the steps necessary to create a data mesh with Matillion. Matillion’s low-code/no-code platform is an ideal partner for individual data teams that include a mix of domain and technology expertise.

Data Mesh Architecture: ALTR's Take

We’re all about making data easier to access – for authorized people. As the data mesh architecture proliferates, companies need to ensure that all data owners across the company are enabled with the appropriate tools in place to keep their sensitive data from spreading recklessly – to meet both internal guidelines and government regulations on data privacy. A data mesh architecture really democratizes data ownership and access, and ALTR’s no-code, low up-front cost solution democratizes data governance to go hand in hand with it. Data owners from finance to operations to marketing do not need to know any code to implement data access controls on the sensitive data they’re responsible for.  

Snowflake harnesses the power of the cloud to help thousands of organizations explore, share, and unlock the actual value of their data. Whether your company has ten employees or 10,000, if you’re one of Snowflake’s 4,500 customers and counting, you’re either thrilled or overwhelmed by the cloud data warehouse’s combination of out-of-the-box functionality and powerful, flexible features.

Wherever you are in your journey, though, it’s never too early or too late to think about how you’re handling Snowflake data governance and security for sensitive data like PII/PHI/PCI.  

When you look at the enterprise-level security and governance capabilities Snowflake offers natively within the platform, you may wonder why you need more (see the Bonus question for this answer). And the options for Snowflake Data Governance offered by partners may sound similar, making it a challenge to know what the differences are and what you need.  

With that in mind, we’ve put together the critical questions you should ask when evaluating Snowflake Data Governance options. Going through this list should reveal the best next step for your company.

Snowflake Data Governance

1. Is the Snowflake data governance solution easy to set up and maintain? Does it use Proxy, Fake SaaS or Real SaaS?

There are several ways vendors can enable their Snowflake data governance solutions. One approach is to utilize a proxy. While proxy solutions have some advantages, they come with serious issues that make them less than ideal for cloud-based Snowflake:

  • Extra effort is required to make all applications go through the proxy, adding time, complexity, and costs to your implementation.  
  • Security holes are created when applications and users can bypass the proxy to get full access to data, increasing risk and surfacing compliance issues  
  • Platform changes may break the proxy without warning, adding unnecessary downtime and delays  
  • On-premises proxies require you to deploy, maintain, and scale more infrastructure than you would with a pure-SaaS Cloud-native solution

SaaS is a better option for Snowflake data governance, but some providers calling themselves “SaaS” are better defined as "Managed Services." In these “Fake SaaS” solutions, vendors spin up, support and update an individual version of the software just for you. This makes it more expensive to run and maintain than true SaaS, costing you more. They can also require long maintenance windows that make the service unavailable during updates.

A proper multi-tenant SaaS-based data governance solution built for the cloud - like ALTR’s - is easier to start and maintain with Snowflake. There’s no hardware deployment or maintenance downtime required, no hardware sitting between your users and the data, no risk of a platform change breaking your integration, and no difficulty scaling your Snowflake usage. Because it’s natively integrated, there are no privacy issues or security holes. A real SaaS-based solution will also have the credentials to back it up: PCI DSS Level 1, SOC 2 Type II certification, and support for HIPAA compliance.

Snowflake Data Governance

2. Is the Snowflake data governance solution easy to use? Does it require code to implement and manage?

Snowflake provides the foundation with native data governance features like sensitive data discovery and classification, access control and history, masking, and more with every release. But for users to take advantage of these Snowflake data governance capabilities on their own, they must be able to write SQL. That can make the features difficult, time-consuming, and costly to implement and manage at scale because data governance administration is limited to DBAs and other developers who can code.

However, the groundwork Snowflake provides allows partners to create solutions that leverage that built-in functionality but deliver an easier-to-use experience. ALTR’s solution provides native cloud integration and a user interface that doesn’t require code to get started or manage. This means your Data Governance teams or even line of business data or analytics users can take over the management of governance policies on Snowflake, freeing DBAs to focus on managing data streams and enabling data-driven insights.

Snowflake Data Governance

3. It is a complete Snowflake data governance solution? Does it secure all of your data and reduce your risk?

This is crucial. You may look for a Snowflake Data Governance solution in response to privacy regulations, but you’ll never be truly compliant without a data security. And most "data governance" options don’t include data protection. While Snowflake offers many enterprise-level security features, there’s no defense against credentialed or privileged access threats. Once someone gets access with compromised credentials, there’s no mechanism for slowing or stopping data consumption.

Some software vendors calling themselves “data governance” only provide data discovery and classification – a data card catalog – without access control. And some other vendors require the data you want to protect to be copied into a new Snowflake database managed by the solution, leaving the raw data in the original database—ungoverned and unprotected. You may never know if anyone has accessed that data, potentially violating privacy regulations that require you to understand and document who has accessed data, even if nothing leaks outside the company.

For complete Snowflake Data Governance, you must not only be able to find and classify your data, but see data access, utilize consumption thresholds to detect anomalies and alert on them, respond to threats with real-time blocking, and tokenize critical data at rest. ALTR combines all these features into a single data governance and security platform that allows you to protect data appropriately based on data governance policies, ensure all your data is secure, and minimize your risk of data loss or theft.

Snowflake Data Governance

4. Is the data governance solution affordable and flexible? Can you start with only what you need?

Most solutions cost $100k to $250k per year to start! These large, legacy on-premises platforms were not built for today’s scalable cloud environment. They require considerable time, resources, and money to even get started, which is an odd fit for Snowflake’s cloud-based platform, where Snowflake On-Demand gives you usage-based, per-second pricing with a month-to-month contract.

ALTR’s pricing starts at “free.” Our Free plan gives you the power to understand how your data is used, add controls around access, and limit your data risk at no cost. Our Enterprise and Enterprise Plus plans are available if you need more advanced governance controls, integration with SOAR or SIEM platforms, or increased data protection and dedicated support.

ALTR’s tiered pricing means there’s no large up-front commitment—you can start Snowflake data governance for free and expand if or when your needs change. Or stay on our free plan forever.

Snowflake Data Governance

Bonus Question: Can't I just build a solution myself? 

While a data admin can write Snowflake masking policy using SQL to leverage Snowflake's native features, what happens next? That is a one-time point fix but what about the long term and wide scale? Can others read and work with it? Do you have a QA team to eliminate errors? Can you ensure it scales correctly and can run quickly across thousands of databases? Do you have the time to integrate it with Okta or Matillion or Splunk? Do you have a roadmap that ensures it stays up to date with new private-preview Snowflake features, keeping up with your changing data and regulatory landscape, and addressing new user service needs? Basically, do you want your data team to be a software development team? You could hire 30 engineers and spend millions of dollars to build enterprise-ready Snowflake data governance software you can trust with the risky connection between users and data, but why should you when there are already cost-effective solutions from companies in the market focused on just this?

Conclusion

Companies flocking to the cloud data party, and Snowflake in particular, are faced with a dizzying array of options for Snowflake Data Governance. However similar the solutions may seem, with a little digging fundamental differences become apparent. ALTR’s solution stands out for its accessible, SaaS-based, no-code setup and management and complete Snowflake data governance and security feature set. And with its reasonable user- and data-based costs, ALTR becomes the obvious next step for Snowflake users to govern and protect their sensitive data.

What is Cloud Data Security? A Definition

Why is everyone talking about cloud data security today? The first wave of digital transformation focused on moving software workloads to SaaS-based applications in the cloud that were easy to spin up, required no new hardware or maintenance, and started with low costs that scaled with use. Today, the next generation of digital transformation is focused on moving the data itself — not just from on-premises data warehouses to the cloud but from other cloud-based applications and services into a central cloud data warehouse (CDW) like Snowflake. This consolidates valuable and often very sensitive data into a single repository with the goal of creating a single source of truth for the organization.

Cloud data security is focused on protecting that sensitive data, regardless of where it’s located, where it’s shared or how it’s used. It uses role-based data access controls, privacy safeguards, encrypted cloud storage, and data tokenization among other tools to limit the data users can access in order to meet data security requirements, comply with privacy regulations and ensure data is secure.

3 Benefits of Cloud Data Security

Cloud data security confers powerful benefits with almost no downsides. In fact, the biggest risk of cloud data security is not doing it.

  • Improve business insights: When applied correctly, cloud data security enables data to be distributed securely throughout an organization, across business units and functional groups, without fear that data could be lost or stolen. That means you can share sensitive PII about customers with your finance teams, your marketing teams and even your sales teams, without worry that the data might make its way outside the company. You can gather information from various in-house and in-cloud business tools such as Salesforce or another CRM, your ERP, or your marketing automation solution into one centralized database where users can cross check and cross reference information across various data sources to uncover surprising insights.
  • Avoid regulatory fines: It’s not just credit card numbers or health information that companies need to worry about anymore – today, practically every company deals with sensitive, regulated data. Personally Identifiable Information (PII) is data that can be used to identify an individual such as Social Security number, date of birth or even home address. It’s regulated by GDPR in Europe and by various state regulations in the US. Although the regulatory landscape is still patchy in the US, all signs point to a federal level statute or new regulation that will lay out rules for companies across the country coming very soon. For companies that want to get ahead of the issue, making sure their cloud data security meets the most stringent requirements is the easiest path. This can help a company ensure its meeting its obligations and reduce risk of fines from any regulation.
  • Cultivate customer relationships: In a 2019 Pew Research Center study 81% of Americans said that the risks of data collection by companies can outweigh the benefits. This might be because 72% say they benefit very little or not at all from the data companies gather about them. A McKinsey survey showed that consumers are more likely to trust companies that only ask for information relevant to the transaction and react quickly to hacks and breaches or actively disclose incidents. These also happen to be some of the requirements of data privacy regulations – only gather the information you need and be upfront, timely and transparent about leaks. Companies can’t continue to gather data at will with no consequences – customers are awake to the risks now and demanding more accountability. This gives organizations a chance to strengthen the relationship with their customers by meeting and exceeding their expectations around privacy. If personalization creates a bond with customers, imagine how much more powerful that would be if buyers also trust you. Organizations that focus on protecting customer data privacy via a future-focused data governance program have an opportunity to take the lead in the market.
Cloud Data Security

Cloud Data Security Challenges

Although cloud data security is a new area of concern, many of the biggest challenges are already well known by companies focused on keeping data safe.

  1. Securing data in infrastructure your company doesn’t own: With so much data moving to the cloud, yesterday’s perimeter is an illusion. If you can’t lock data down behind a firewall, and guess what, you can’t, then you’re forced to trust your cloud data warehouse. These facilities are extremely secure, but they only cover part of your security needs. They don’t manage or control user data access – that’s left to you. Bad actors don’t care where the data is – in fact, cloud data warehouses that consolidate data from multiple sources into a single store make a compelling target. Regulators don’t care where data is either when it comes to responsibility for keeping it safe: it’s on the company who collects it. Larger companies in more regulated industries face very punitive fines if there’s a leak—which can lead to severe consequences for the business.
  2. Securing data your team doesn’t own: From a security perspective, it’s difficult to protect data if you don’t know what it is or where it is. With various functional groups across companies making the leap to cloud data warehouses on their own in order to gain business insights, it’s difficult for the responsible groups such as security teams to be sure data is safe.
  3. Stopping privileged access threats: When sensitive data is loaded to a CDW there’s often one person who doesn’t really need access, but still has it: your Snowflake admin. If your company is like Redwood Logistics, uploading sensitive financial data in order to better estimate costs, you really don’t want your admin to have access – and usually, he doesn’t either! Even if you trust your admin and you probably do, there’s no guarantee his credentials won’t get stolen and no upside to him or the business to allowing that access. This leads into our next challenge:
  4. Stopping credentialed access threats: Even the most trustworthy employees can be phished, socially engineered or plain have their credentials stolen. Despite the training companies have done to educate users about these risks, the credentialed access threat continues to be one of the top sources of breach in the Verizon Data Breach Investigations Report, for the sixth year in a row! ALTR’s James Beecham asks year after year: “Why Haven’t We Stopped Credentialed Access Threats?” We know how – even when humans are fallible there is technology that can help.
  5. Using data safely in Business Intelligence tools: One of the key goals to consolidating data into a centralized CDW is to enable business intelligence access. BI tools like Tableau, ThoughtSpot and Lookr depend on access to all available data in order to provide a full 360 view of the business.  When the data can’t be utilized securely in these tools, it often results in security admins making the call to leave that data out of the equation, creating a broken view of the business.
Cloud Data Security

Cloud Data Security Best Practices

There are a few best practices every organization should incorporate into their successful cloud data security program:

   1. Keep your eye on the data - wherever it is

This shift to the cloud really requires a shift in the security mindset: from perimeter-centric to data-centric security. It means CISOs (Chief Information Security Officer) and security teams will have to stop thinking about hardware, data centers, firewalls, and instead focus on the end goal: protecting the data itself. Responsible teams need to embrace data governance and security policies around data throughout the organization and its data ecosystem. They need to understand who should have access to the data, understand how data is used, and place relevant controls and protections around data access. In fact they could start with a data observability program in order to understand what normal data usage looks like so they're better able to identify abnormal.

   2. Empower everyone to secure cloud data

We often hear “security is everyone’s responsibility.” But how could it be when most are left out of the process? While data is a key vulnerability for essentially every company, until recently most companies didn’t want to acknowledge the risk. Now, with a new data breach announcement every few weeks, the problem is impossible to ignore. When marketing teams are using shadow cloud data warehouse resources instead of waiting for security or IT teams to vet the solution for security requirements, it’s easier to make sure data owners have the means to protect the data themselves. Instead of governance technologies based on legacy infrastructure that not only require big investments in time, money, and human resources to implement, but also expensive developers to set up and maintain, democratize data governance with tools that allow non-coders to rollout and manage the data security solution themselves in weeks or even days.

   3. Add cloud data security checks and balances to your cloud data warehouse

To protect data (and your Database Administrator!) from the risk of sensitive data, put a neutral third party in place that can keep an eye on data access - natively integrated into to the cloud data platform yet outside the control of the platform admin. This separation of duties should make it impossible to access the data without key people being notified and can limit the amount of data revealed, even to admins. It can include features like real time alerts that notify relevant stakeholders at the company whenever the admin (or any user for that matter) tries to access the data. If none of the allowed users accessed the data, they’ll know unauthorized access has occurred within seconds. Alert formats can include text message, Slack or Teams notifications, emails, phone calls, SIEM integrations, etc. Data access rate limits that constrain the amount of de-tokenized data delivered to any user, including admins, also limit risk. While a user can request 10 million records, they may only get back 10,000 or 10 per hour. This can also trigger an alert to relevant stakeholders. These features ensure that no single user has the keys to the entire data store – no matter who they are.  

   4. Always assume credentials are compromised and cloud data is at risk

Knowing that the easiest and best ways to stop credentialed access threats are undermined by people being people, we’re simply better off assuming all credentials are compromised. Stolen credentials are the most dangerous if, once an account gets through the front door, it has access to the entire house including the kitchen sink. Instead of treating the network as having one front door, with one lock, require authorization to enter each room. This is actually Forrester’s “Zero Trust” security model – no single log in or identity or device is trusted enough to be given unlimited access. This is especially important as more data moves outside the traditional corporate security perimeter and into the cloud, where anyone with the right username and password can log in. While cloud vendors do deliver enterprise class security against cyber threats, credentialed access is their biggest weakness. It’s nearly impossible for a SaaS-hosted database to know if an authorized user should really have access or not. Identity access and data control are still up to the companies utilizing the cloud platform.  

Cloud Data Security

Key Components of Cloud Data Security Solutions

An effective cloud data security solution includes these key components:

  • Knowing where your data is and categorizing what data is sensitive: With data often spread throughout an organization’s technology stack, it can be challenging to even know all the various places sensitive data like social security numbers are stored. Solving this issue often starts with a data discovery and data classification solution that can find data across stores, group information into types of data and apply appropriate tags.
  • Controlling access to sensitive data: In today’s data-driven enterprises, data is not just used by data scientists. Everyone from marketing to sales to product teams may need or want access to sensitive data in order to make more informed business decisions but not everyone will be authorized to have access to all the data. Making sure you have the ability to grant access to some users but not others, or allow access to some roles but not others, in an efficient, scalable and secure way is one of the most important components of cloud data security.
  • Putting extra limits on sensitive data access: Data security doesn’t have to be either/or. With data access rate limits, users can be prohibited from gaining access to more data than they should reasonably need. This can stop bad actors with credentials from downloading the whole database by setting rate limits per user or per time period, ex: 10,000 records in 1 hour vs 1M.  
  • Securing sensitive data with encryption or tokenization: Encryption is one cloud data security approach that is highly recommended by security professionals. However, it does have weaknesses and limitations when it comes to utilizing data in the cloud. Tokenization can enable data to be stored securely yet still be available for analysis.

Conclusion

There’s no chance of reversing the migration of data to the cloud and why would we want to? The benefits are so staggering, it’s well worth any challenges presented. As long as cloud data security is built in as a priority from the start, risks can be mitigated, and the full power and possibility of a consolidated Cloud Data Warehouse can come to fruition.  

See how ALTR can help automate and scale your cloud data security in Snowflake. Get a demo!  

The road to becoming one of today’s data-driven companies is full of challenges, not the least of which is finding and keeping the right people with the right skills to get you there. And it’s not always about the individual, it’s also about the team and finding the right combined characteristics that lead to success.

As part of our Expert Panel Series, we asked some experts in the modern data ecosystem what attributes a modern data team should have. Here’s what we heard…

John DaCosta, Sr. Sales Engineer, Snowflake:

“I have been referencing this McKinsey article for years now. ‘A two-speed IT architecture for the digital enterprise.’ The concept is an Enterprise IT organization (Data Platform / Networking, etc.) that manages mature assets / processes. The 2nd speed teams are smaller, agile and more focused. They focus on "shadow it,” for example: Marketing Analytics / Marketing Technology. They are allowed to do whatever they need to get the job done. But once things are mature, they can be transitioned into Enterprise IT. In my interpretation, functional areas have their own ‘smaller technology teams’ that have all the required skill sets to deliver on projects for the business unit sponsoring it.”

Phil Warner, Director of Data Engineering, PandaDoc: 

“Hire T-shaped skillsets and people who are happy to collaborate. Nothing is worse than a team of siloed individual contributors. [People with T-shaped skills] are team members who specialize in a particular area (such as Python, or data modeling, or infrastructure, etc.), but also have all-round skills, to a lesser degree, across the board. This allows for broad coverage across the team, without having to train everyone on everything to the same level, and also gives you team members that'll never say 'that's not my job', or sit there and pout when a particular ETL process didn't get written in Python this time around. They also tend to be inquisitive and curious by nature, and so are open to new ways of doing things and new technologies to move things forward, rather than painting themselves into a box and refusing to do anything other than what they know.

The opposite of a person with a T-shaped skillset is a one-trick pony. 😁”

Louis Hassel, Account Executive, Alation: 

“A modern team should have a variety of skills but the best attribute they can have is a shared vision of the overall goal of the data project. If the Marketing manager needs hourly reports and the data engineering team is building daily extracts there is a disconnect. The data exec level would be great, but not a necessity. Just need to do a little planning to succeed rather than rebuild everything.”

James Beecham, Founder & CTO, ALTR:

“Similar to a full-stack developer, or a ‘feature team’ for software development, having team members that are cross functional is key to accelerating your data initiatives. I have seen too many projects stall because one person says, ‘I don’t know anything about the data pipeline, so I cannot tell you the answer’ or ‘I don’t have access to the data so I cannot verify that classification report.’ These types of bottlenecks always pop up at the worst time and cause delays. Cross training team members, having folks who are not afraid of using every tool you have in your stack is critical to your success.”  

Watch out for the next monthly installment of our Expert Panel Series on LinkedIn!

What is Data Tokenization? – a Definition

You may be familiar with the idea of encryption to protect sensitive data, but maybe the idea of tokenization is new. What is data tokenization? In the realm of data security, “tokenization” is the practice of replacing a piece of sensitive or regulated data (like PII or a credit card number) with a non-sensitive counterpart, called a token, that has no inherent value. The token maps back to the sensitive data through an external data tokenization system. Data can be tokenized and de-tokenized as often as needed with approved access to the tokenization system.

How Does Tokenization of Data Work?

Original data is mapped to a token using methods that make the token impractical or impossible to restore without access to the data tokenization system. Since there is no relationship between the original data and the token, there is no standard key that can unlock or reverse lists of tokenized data. The only way to undo tokenization of data is via the system that tokenized it. This requires the tokenization system to be secured and validated using the highest security levels for sensitive data protection, secure storage, audit, authentication and authorization. The tokenization system is the only vehicle for providing data processing applications with the authority and interfaces to request tokens or de-tokenize to the original sensitive data.

Replacing original data with tokens in data processing systems and applications like business intelligence tools minimizes exposure of sensitive data in those applications, stores, people and processes, reducing risk of compromise, breach or unauthorized access to sensitive or regulated data. Applications, except for a handful of necessary applications or users authorized to de-tokenize when strictly necessary for a required business purpose, can operate using tokens instead of live data,. Data tokenization systems may be operated within a secure isolated segment of the in-house data center, or as a service from a secure service provider.

What is Data Tokenization Used For?

Tokenization may be used to safeguard sensitive data including bank accounts, financial statements, medical records, criminal records, driver's licenses, loan applications, stock trades, voter registrations, and other types of personally identifiable information (PII). Data tokenization is most often used in credit card processing, and the PCI Council defines tokenization as "a process by which the primary account number (PAN) is replaced with a surrogate value called a token. De-tokenization is the reverse process of redeeming a token for its associated PAN value. The security of an individual token relies predominantly on the infeasibility of determining the original PAN knowing only the surrogate value".

The choice of tokenization as an alternative to other data security techniques such as encryption, anonymization, or hashing will depend on varying regulatory requirements, interpretation, and acceptance by auditing or assessment entities. We cover the advantages and disadvantages of tokenization versus other data security solutions below.

Benefits of Data Tokenization

When it comes to solving these cloud migration challenges, tokenization of data has all the obfuscation benefits of encryption, hashing, and anonymization, while providing much greater usability. Let’s look at the advantages in more detail. 

  1. No formula or key: Tokenization replaces plain-text data with an unrelated token that has no value if breached. There’s no mathematical formula or key; a token vault holds the real data secure.    
  1. Acts just like real data: Users and applications can treat tokens the same as real data and perform high-level analysis it, without opening up the door to risk of leaks or loss. Anonymized data on the other hand provides only limited analytics capability because you’re working with ranges, while hashed and encrypted data are ineligible for analytics. With the right tokenization solution, you can share tokenized data from the data warehouse with any application, without requiring data to be unencrypted and inadvertently exposing it to users.   
  1. Granular analytics: Retaining the connection to the original data enables you to dig deeper into the data with more granular analytics than anonymization. Anonymized data is limited by the original parameters, such as age ranges or broad locations, which might not provide enough details or flexibility for future purposes. With tokenized data, analysts can create fresh segments of data as needed, down to the original, individual street address, age or health information. 
  1. Analytics plus protection: Tokenization delivers the advantages of analytics with the strong at-rest protection of encryption. For the strongest possible security, look for solutions that limit the amount of tokenized data that can be de-tokenization and also issue notifications and alerts when data is de-tokenized so you can ensure only approved users get the data. 

Tokenization Vs. Encryption

1. Tokens have no mathematical relationship to the original data, which means unlike encrypted data, tokenized data can’t be broken or returned to their original form.

While many of us might think encryption is one of the strongest ways to protect stored data, it has a few weaknesses, including this big one: the encrypted information is simply a version of the original plain text data, scrambled by math. If a hacker gets their hands on a set of encrypted data and the key, they essentially have the source data. That means breaches of sensitive PII, even of encrypted data, require reporting under state data privacy laws. Tokenizing data, on the other hand, replaces the plain text data with a completely unrelated “token” that has no value if breached. Unlike encryption, there is no mathematical formula or “key” to unlocking the data – the real data remains secure in a token vault.

2. Tokens can be made to match the relationships and distinctness of the original data so that meta-analysis can be performed on tokenized data.

When one of the main goals of moving data to the cloud is to make it available for analytics, tokenizing the data delivers a distinct advantage: actions such as counts of new users, lookups of users in specific locations, and joins of data for the same user from multiple systems can be done on the secure, tokenized data. Analysts can gain insight and find high-level trends without requiring access to the plain text sensitive data. Standard encrypted data, on the other hand, must be decrypted to operate on, and once the data is decrypted there’s no guarantee it will be deleted and not be forgotten, unsecured, in the user’s download folder. As companies seek to comply with data privacy regulations, demonstrating to auditors that access to raw PII is as limited as possible is also a huge bonus. Data tokenization allows you to feed tokenized data directly from Snowflake into whatever application needs it, without requiring data to be unencrypted and potentially inadvertently exposed to privileged users.

3. Tokens maintain a connection to the original data, so analysis can be drilled down to the individual as needed.

Anonymized data is a security alternative that removes the personally identifiable information by grouping data into ranges. It can keep sensitive data safe while still allowing for high-level analysis. For example, you may group customers by age range or general location, removing the specific birth date or address. Analysts can derive some insights from this, but if they wish to change the cut or focus in, for example looking at users aged 20 to 25 versus 20 to 30, there’s no ability to do so. Anonymized data is limited by the original parameters which might not provide enough granularity or flexibility. And once the data has been analyzed, if a user wants to send a marketing offer to the group of customers, they can’t, because there’s no relationship to the original, individual PII.

Three Risk-based Models for Tokenizing Data in the Cloud

Depending on the sensitivity level of your data or comfort with risk there are several spots at which you could tokenize data on its journey to the cloud. We see three main models - the best choice for your company will depend on the risks you’re facing:

Level 1: Tokenize data before it goes into a cloud data warehouse

  1. The first issue might be that you’re consolidating sensitive data from multiple databases. While having that data in one place makes it easier for authorized users, it might also make it easier for unauthorized users! Moving from multiple source databases or applications with their own siloed and segmented security and log in requirements to one central repository gives bad actors, hackers or disgruntled employees just one location to sneak into to have access to all your sensitive data. It creates a much bigger target and bigger risk.  
  1. And this leads to the second issue: as more and more data is stored in high-profile cloud data warehouses, they have become a bigger focus for bad actors and nation states. Why should they go after Salesforce or Workday or other discrete applications separately when all the same data can be found in one giant hoard?  
  1. The third concern might be about privileged access from Snowflake employees or your own Snowflake admins who could, but really shouldn’t, have access to the sensitive data in your cloud data warehouse.  
data tokenization

If your company is facing any of these situations, it makes sense for you to choose “Level 1 Tokenization”: tokenize data just before it goes into the cloud. By tokenizing data that is stored in the cloud data warehouse, you ensure that only the people you authorize have access to the original, sensitive data.

Level 2: Tokenize data before moving it through the ETL process

As you’re mapping out your path to the cloud, you may want to make sure data is protected as soon as it leaves the secure walls of your datacenter. This is especially challenging for CISOs who’ve spent years hardening the security of perimeter only to have control wrested away as sensitive data is moved to cloud data warehouses they don’t control. If you’re working with an outside ETL (extract, transform, load) provider to help you prepare, combine, and move your data, that will be the first step outside your perimeter you want to safeguard. Even though you hired them, without years of built-up trust, you may not want them to have access to sensitive data. Or it may even be out of your hands—you may have agreements or contracts with your own customers that specify you can’t let any vendor or other entity have access without written consent.  

data tokenization

In this case, “Level 2 Tokenization” is probably the right choice. This takes one step back in the data transfer path and tokenizes sensitive data before it even reaches the ETL. Instead of direct connection to the source database, the ETL provider connects through the data tokenization software which returns tokens. ALTR partners with SaaS-based ETL providers like Matillion to make this seamless for data teams.  

Level 3: End-to-end on-premises-to-cloud data tokenization

If you’re a very large financial institution classified as “critical vendor” by the US government, you’re familiar with the arduous security required. This includes ensuring that ultra-sensitive financial data is exceedingly secure – no unauthorized users, inside or outside the enterprise, can have access to that data, no matter where it is. You already have this nailed down in your on-premises data stores, but we’re living in the 21st century and everyone from marketing to IT operations is saying “you have to go to the cloud.” In this case, you’ll need “Level 3 Tokenization”: full end-to-end data tokenization from all your onsite databases through to your cloud data warehouse.  

data tokenization

As you can imagine, this can be a complex task. It requires tokenization of data across multiple on-premises systems before even starting the data transfer journey. The upside is that it can also shine a light on who’s accessing your data, wherever it is. You’ll quickly hear from people throughout the company who relied on sensitive data to do their jobs when the next time they run a report all they get back is tokens. This turns into a benefit by stopping “dark access” to sensitive data.  

Conclusion

Data tokenization can provide unique data security benefits across your entire path to the cloud. ALTR’s SaaS-based approach to data tokenization-as-a-service means we can cover data wherever it’s located: on-premises, in the cloud or even in other SaaS-based software. This also allows us to deliver innovations like new token formats or new security features more quickly, with no need for users to upgrade. Our tokenization solutions also range from flexible and scalable vaulted tokenization all the way up to PCI Level 1 compliant, allowing companies to choose the best balance of speed, security, and cost for their business. We’ve also invested heavily in IP that enables our database driver to connect transparently and keep data usable while tokenized. The drivers can, for example, perform the lookups and joins needed to keep applications that are unused to tokenization running.

With data tokenization from ALTR, users can bring sensitive data safely into the cloud to get full analytic value from it, while helping meet contractual security requirements or the steepest regulatory challenges.

Get the latest from ALTR
Subscribe below to stay up to date with our team, upcoming events, new feature releases, and more.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.