BLOG SPOTLIGHT
Navigating the chaos of data security in the age of GenAI—let’s break down what needs to happen next.
Read more
Popular
Sep 20
0
min
ALTR Welcomes Laura Malins as VP of Product
ALTR continues to strengthen its leadership team, and the latest addition brings a wealth of technical expertise and a fresh perspective to our growing company. We’re thrilled to welcome Laura Malins as the newest member of the ALTR family and VP of Product. With over a decade of experience in data, Laura’s extensive background across industries and technical roles makes her an invaluable asset as we continue to push the boundaries of data security and governance.
From Matillion to ALTR: A Proven Leader in Data Innovation
Laura joins us from Matillion, where she spent the past ten years shaping the future of data transformation. As VP of Product, she ran the Matillion ETL Product and spearheaded the launch of their revolutionary SaaS offering, Data Productivity Cloud. Her ability to understand deeply technical challenges and translate them into user-friendly solutions has earned her recognition as a product leader in the data space.
“I’ve worked with ALTR for a few years now and have always admired the company and the product. Data security platforms are becoming more pertinent than ever, and ALTR’s innovative product is well-positioned to support compliance and security requirements. I’m delighted to join such a strong and ambitious team, and I look forward to taking the product to the next level,” Laura shares.
Laura’s deep technical expertise and user-focused approach will be pivotal in pushing ALTR’s product suite to new heights. Her ability to bridge the gap between complex data challenges and practical, user-friendly solutions aligns seamlessly with our vision of delivering powerful, scalable data access control. With her proven leadership, we anticipate not just product evolution but transformation—bringing enhanced capabilities to our customers while staying ahead of the ever-evolving data security landscape. Laura’s leadership will help us continue empowering businesses to protect their most valuable assets while driving innovation forward.
Sep 19
0
min
Data Security for Generative AI: Where Do We Even Begin?
If you haven’t noticed the wave of Generative AI sweeping across the enterprise hardware and software world, it certainly would have hit you within 5 minutes of attending Big Data London, one of the UK’s leading data, analytics, and AI events. Having attended last year’s show, I can confidently say AI wasn’t nearly as dominant. But now? It’s everywhere, transforming not just this event but countless others. AI has officially taken over!
As a data security focused person, it is exciting and terrifying to see all the buzz. I’m excited because it feels like we’re on the verge of a seismic shift in technology—on par with the rise of the web or the cloud—driven by GenAI. And I get to witness it firsthand! But it is terrifying to see all the applications, solution consultants, database vendors and others selling happy GenAI stories to customers. I could scream into the loud buzz of the show floor, “We have seen this movie before! Don’t let the development of GenAI applications outpace the critical need for data security!” I’m thinking about the rush to web, the rush to mobile, the rush to cloud. All of these previous shifts suffer from the same thing: security is boring and we don’t want to do it. What definitely wasn’t boring was using a groundbreaking mobile app from 1800flowers.com to buy flowers—that was cool! Let’s have more of that! Who cares about security, right? That can wait…
Cyber security, and data security in particular, have had the task of keeping up with the excitement of new applications for decades. The ALTR engineering office is in beautiful Melbourne, FL just a few hours away from Disney. When I see a young mother or father with a concerned look racing after their young child who couldn’t care less that they are about to get run over by a popcorn stand, I think “Application users are the kids, security people are the parent, and GenAI is whichever Disney character the kid can’t wait to hug.” It’s cute, but dangerous. This is what is happening with GenAI and security.
As applications have evolved so has data security. Below is an example of these application evolutions and how security has adapted to cover the new weaknesses of each evolution.
What is Making Generative AI Hard to Secure?
The simple answer is: we don’t fully know. It’s not just that we’re still figuring out how to secure GenAI (spoiler: we haven’t cracked that yet); it’s that we don’t even fully understand how these Large Language Models (LLMs) and GenAI systems truly operate. Even the developers behind these models can’t entirely explain their inner workings. How do you secure something you can’t fully comprehend? The reality is—you can’t.
So, what do we know?
We know two things:
1. Each evolution of applications and data products has been secured by building upon the principles of the previous generation. What has been working well needs to be hardened and expanded.
2. LLMs present two new and very hard problems to solve: data ownership and data access.
Let’s dive into the second part first. To get access to the hardware currently required to train and run LLMs we must use cloud or shared resources. Things like ChatGPT or NVIDA’s DGX cloud. Until these models require less hardware or the hardware magically becomes more available, this truth will hold.
Similar to the early days of the internet, sensitive information was desired to be sent and received on shared internet lines. The internet was great for transmitting public or non-sensitive information, but how could banking and healthcare use public internet lines to send and receive sensitive information? Enter TLS. This is the same problem facing LLMs today.
How can a business (or even a person for that matter) use a public and shared LLM/GenAI system without fear of data exposure? Well, it’s a very challenging. And not a problem that a traditional data security provider can solve. Luckily there are really smart people working on this solution like the folks at Protopia.ai.
So, data ownership is being addressed much like how TLS solved the private-information-flowing-on-public-internet-lines. And that’s a huge step forward. What about data access?
This one is a bit tougher. There are some schools of thought about prompt control and data classification within AI responses. But this feels a lot like CASB all over again, which didn’t exactly hit the mark for SaaS security. In my opinion, until these models can pinpoint exactly where their responses are coming from—essentially, identify the data sets they’ve learned from —and also understand who is asking the questions, we’ll continue to face risks. Only then can we prevent situations where an intern asks questions and gets answers that should only be accessible to the CEO.
Going back to what we know, the first item, we will need to build upon the solid data security foundations that got us to this point in the first place. It has become clear to me that for the next few years, Retrieval-Augmented Generation (RAG) will be how enterprises globally interact with LLMs and GenAI. While this is not a silver bullet, it’s the best shot busineses have to leverage the power of public models while keeping private information safe.
With the adoption of RAG techniques, the core data security pillars that have been bearing the load of a data lake or warehouse to date will need to be braced for extra load.
Data classification and discovery needs to be cheap, fast, and accurate. Businesses must continuously ensure that any information unsuitable for RAG workloads hasn’t slipped into the database from which retrieval occurs. This constant vigilance is crucial to maintaining secure and compliant operations. This is the first step.
The next step is to layer access control and data access monitoring such that the business can easily set the rules for which types of data are allowed to be used by the different models and use cases. Just as service accounts for BI tools need access control, so to do service accounts for the purposes of RAG. On top of these access controls, near-real-time data access logging must be present. As the RAG workloads access the data, these logs are used to inform the business if any access has changed and allows the business to easily comply with internal and external audits proving they are only using approved data sets with public LLMs and GenAI models.
Last step, keep the data secure at rest. The use of LLMs and GenAI will only accelerate the migration of sensitive data into the cloud. These data elements that were once protected on-prem will have to be protected in the cloud as well. But there is a catch. The scale requirements of this data protection will be a new challenge for businesses. You will not be able to point your existing on-prem-based encryption or tokenization solution to a cloud database like Snowflake and expect to get the full value of Snowflake.
When prospects or customers ask me, “What is ALTR’s solution for securing LLMs and GenAI” I used to joke with them and say, “Nothing!” But now I’ve learned the right response, “The same thing we’ve always done to secure your data—just with even more precision and focus for today’s challenges.” The use of LLMs and GenAI is exciting and scary at the same time. One way to reduce the anxiety is to start with a solid foundation of understanding what data you have, how that data is allowed to be used, and whether you prove that the data is safe at rest and in motion.
This does not mean you cannot use ChatGPT. It just means you must realize that you were once that careless child running with arms wide open to Mickey, but now you are the concerned parent. Your teams and company will be eager to dive headfirst into GenAI, but it’s crucial that you can articulate why this journey is complex and how you plan to guide them there safely. It begins with mastering the fundamentals and gradually tackling the tough new challenges that come with this powerful technology.
Sep 9
0
min
ALTR Expands GTM Team with Powerhouse Hires to Lead the Charge in Data Security
ALTR isn’t just keeping pace with the evolving data security landscape—we’re setting the speed limit. As businesses scramble to safeguard their data, ALTR is not just another player in the game; we’re the go-to solution for bulletproof data access control and security. And today, we’re doubling down on that promise with three strategic hires to turbocharge our Go-To-Market (GTM) strategy.
Meet the Heavy Hitters
Christy Baldassarre
Christy Baldassarre joins us as our new Director of Marketing, bringing a formidable blend of strategic vision and execution prowess. With a track record of driving brand growth and market penetration, Christy excels at crafting compelling narratives that resonate with target audiences. She’s a master at turning complex concepts into clear, impactful messaging and knows how to leverage the latest digital marketing tactics to amplify ALTR’s voice.
"I am excited to be on such a great team and to be a part of taking ALTR to the next level. I chose ALTR because of its excellence in Cloud Security and Data Protection. This is a great opportunity to collaborate with such a visionary team and contribute to groundbreaking solutions that not only push boundaries but set new standards of how to keep everyone’s data safe." - Christy
Rick McBride
Rick McBride, our new Demand Gen Manager, brings a deep expertise in go-to-market strategy. With a strong foundation in business development, Rick has honed his skills in identifying opportunities and driving pipeline growth from the ground up. He’s not just about crafting campaigns; Rick knows how to connect with decision-makers and convert interest into action.
“A successful go-to-market strategy thrives on seamless collaboration across various teams, and our GTM group is poised to be the driving force behind it. We're set to champion the Snowflake ecosystem—engaging with customers, Snowflake’s Field Sales team, and partners alike—to fuel strategic growth. By leveraging Snowflake's powerful native capabilities in Security and Governance, we aim to deliver at the speed and scale that Snowflake users expect. We're thrilled to extend this value to every organization that prioritizes and trusts Snowflake for their data management needs!” - Rick
George Policastro
Next, we've got George Policastro as our newest Account Executive. George is a seasoned sales professional with a proven track record of closing complex deals and delivering results. His strengths lie in his ability to deeply understand client needs, build lasting relationships, and strategically navigate the sales process to drive success.
"I’m thrilled to join ALTR and tackle one of the biggest challenges organizations face today: securing their sensitive data while unlocking its full potential to drive business growth." - George
ALTR: Defining the Future of Data Access Control and Security
The world of data security and governance has evolved dramatically from the days of simple perimeter defenses. Now, we’re dealing with sophisticated, multi-layered security strategies that need to keep up with cybercriminals who are more aggressive and resourceful than ever. The core principles—knowing where your data is, who can access it, and ensuring its protection—haven’t changed. However, as data moves to the cloud, the challenge is achieving these goals at an unprecedented scale and speed.
That’s where ALTR excels. We’re not just providing solutions; we’re reimagining what data access control and security can be in a cloud-first world. By cutting through the complexities and inefficiencies of traditional methods, we deliver a streamlined, scalable approach that makes data security both simple and powerful. Our intuitive automated access controls, policy automation, and real-time data observability empower organizations to protect sensitive data at rest, in transit, and in use—effortlessly and at lightning speed. With ALTR, securing your data isn’t just more accessible; it’s smarter, faster, and designed for today’s dynamic cloud environments.
With our latest GTM team expansion, we’re fortifying our foundation to evolve into a cloud data security market leader who’s not just part of the conversation but is driving it.
Sep 3
0
min
Unleashing the Power of FPE: ALTR Key Sharing Meets Snowflake Data Sharing
In a world where data breaches and privacy threats are the norm, safeguarding sensitive information is no longer optional—it's critical. As regulations tighten and privacy concerns soar, our customers are demanding cutting-edge solutions that don't just secure their data but do so with finesse. Enter Format Preserving Encryption (FPE). When paired with ALTR's capability to seamlessly share encryption keys with trusted third parties via platforms like Snowflake's data sharing, FPE becomes a game-changer.
Understanding Format Preserving Encryption (FPE)
Format Preserving Encryption (FPE) is a type of encryption that ensures the encrypted data retains the same format as the original plaintext. For example, if a credit card number is encrypted using FPE, the resulting ciphertext will still appear as a string of digits of the same length. This characteristic makes FPE particularly useful in scenarios where maintaining data format is crucial, such as legacy systems, databases, or applications requiring data in a specific format.
Key Benefits of FPE
Seamless Integration
FPE maintains the data format, allowing easy integration into existing data pipelines without requiring significant changes. This minimizes the impact on business operations and reduces the costs associated with implementing encryption.
Compliance with Regulations
Many regulatory frameworks, such as the GDPR, PCI-DSS, and HIPAA, mandate the protection of sensitive data. FPE helps organizations comply with these regulations by ensuring that data is encrypted to preserve its usability and format, which can sometimes be a requirement in these standards.
Enhanced Data Utility
Unlike traditional encryption methods, FPE allows encrypted data to be used in its existing form for specific operations, such as searches, sorting, and indexing. This ensures organizations can continue to derive value from their data without compromising security.
The Role of Snowflake in Data Sharing
Snowflake is a cloud-based data warehousing platform that allows organizations to store, process, and analyze large volumes of data. One of its differentiating features is data sharing, which enables companies to share live, governed data with other Snowflake accounts in a secure and controlled manner while also shifting the cost of the computing operations of the data over to the share's consumer.
Key Features of Snowflake Data Sharing
Real-Time Data Access
Snowflake's data sharing allows recipients to access shared data in real-time, ensuring they always have the most up-to-date information. This is particularly valuable in scenarios where timely access to data is critical, such as in financial services or healthcare.
Secure Data Exchange
Snowflake's platform is designed with security at its core. Data sharing is governed by robust access controls, ensuring only authorized parties can view or interact with the shared data. This is crucial for maintaining the confidentiality and integrity of sensitive information.
Scalability and Flexibility
Snowflake's architecture allows for easy scalability, enabling organizations to share large volumes of data with multiple parties without compromising performance. Additionally, the platform supports a wide range of data formats and types, making it suitable for diverse use cases.
The Power of Combining FPE with Snowflake’s Key Sharing
When FPE is combined with the ability to share encryption keys via Snowflake's data sharing, it unlocks a new level of security and flexibility for organizations. This combination addresses several critical challenges in data protection and sharing:
Controlled Access to Encrypted Data
By leveraging FPE, organizations can encrypt sensitive data while preserving its format. However, there are scenarios where this encrypted data needs to be shared with trusted third parties, such as partners, auditors, or service providers. Through Snowflake's data sharing and ALTR's FPE Key Sharing, companies can securely share encrypted data along with the corresponding encryption keys. This allows the third party to decrypt the data within the policies that they have defined and use it as needed.
Data Security Across Multiple Environments
In a multi-cloud or hybrid environment, data often needs to be moved between different systems or shared with external entities. Traditional encryption methods can be cumbersome in such scenarios, as they require extensive reconfiguration or critical management efforts. However, with FPE and Snowflake's key sharing, organizations can seamlessly share encrypted data across different environments without compromising security. The encryption keys can be securely shared via Snowflake, ensuring only authorized parties can decrypt and access the data.
Regulatory Compliance and Auditing
Many regulations require organizations to demonstrate that they have implemented appropriate security measures to protect sensitive data. By using FPE, companies can encrypt data that complies with these regulations. At the same time, the ability to share encryption keys through Snowflake ensures that data can be securely shared with auditors or regulators. Additionally, Snowflake's robust logging and auditing capabilities provide a detailed record of who accessed the data and when which is essential for compliance reporting.
Enhanced Collaboration with Partners
In finance, healthcare, and retail industries, collaboration with external partners is often essential. However, sharing sensitive data with these partners presents significant security risks. By combining FPE with ALTR's key sharing, organizations can securely share encrypted data with partners, ensuring that sensitive information is transmitted throughout the data's lifecycle, including across shares. This enables more effective collaboration without compromising data security.
Efficient and Secure Data Processing
Specific data processing tasks, such as data analytics or AI model training, require access to large volumes of data. In scenarios where this data is sensitive, encryption is necessary. However, traditional encryption methods can hinder the efficiency of these tasks due to the need for decryption before processing. With FPE, the data can remain encrypted during processing, while ALTR's key sharing allows the consumer to decrypt data only when absolutely necessary. This ensures that data processing is both secure and efficient.
Use Cases of FPE with ALTR Key Sharing
To better understand the value of combining FPE with ALTR's key sharing, let's explore a few use cases:
Financial Services
In the financial sector, organizations handle a vast amount of sensitive data, including customer information, transaction details, and credit card numbers. FPE can encrypt this data while preserving its format, ensuring it can still be used in legacy systems and applications. Through Snowflake's data sharing, financial institutions can securely share encrypted transaction data with external auditors, partners, or regulators, along with the necessary encryption keys. This ensures compliance with regulations while maintaining the security of sensitive information.
Healthcare
Healthcare organizations often need to share patient data with external entities, such as insurance companies or research institutions. FPE can encrypt patient records, ensuring they remain secure while preserving the format required for healthcare applications. Snowflake's data sharing allows healthcare providers to securely share this encrypted data with third parties. At the same time, ALTR enables the sharing of the corresponding encryption keys, enabling them to access and use the data while ensuring compliance with HIPAA and other regulations.
Retail
Retailers often need to share customer data with marketing partners, payment processors, or logistics providers. FPE can be used to encrypt customer information, such as names, addresses, and payment details while maintaining the format required for retail systems. Snowflake's data sharing enables retailers to securely share this encrypted data with their partners; with ALTR, the encryption keys are also shared, ensuring that customer information is always protected.
The Broader Implications for Businesses
The combination of Format Preserving Encryption and ALTR's key-sharing capabilities represents a significant advancement in the field of data security. This approach addresses several critical challenges in data protection and sharing by enabling organizations to securely share encrypted data with trusted third parties.
Strengthening Trust and Collaboration
In an increasingly interconnected world, businesses must collaborate with external partners and share data to remain competitive. However, this collaboration often comes with significant security risks. By leveraging FPE and ALTR's key sharing, organizations can strengthen trust with their partners by ensuring that sensitive data is always protected, even when shared. This leads to more effective and secure collaboration, ultimately driving business success.
Reducing the Risk of Data Breaches
Data breaches, including financial losses, reputational damage, and regulatory penalties, can devastate businesses. Organizations can significantly reduce the risk of data breaches by encrypting sensitive data with FPE and securely sharing it via Snowflake. Even if the data is intercepted, it remains protected, as only authorized parties with the corresponding encryption keys can decrypt it.
Enabling Innovation While Ensuring Security
As organizations continue to innovate and leverage new technologies, such as artificial intelligence and machine learning, the need for secure data sharing will only grow. The combination of FPE and ALTR's key sharing enables businesses to securely share and process data innovatively without compromising security. This ensures that organizations can continue to innovate while protecting their most valuable asset – their data.
Wrapping Up
Integrating Format Preserving Encryption with ALTR's key sharing capabilities offers a powerful solution for organizations seeking to protect sensitive data while enabling secure collaboration and innovation. By preserving the format of encrypted data and allowing for secure key sharing, this approach addresses critical challenges in data protection, regulatory compliance, and data sharing across multiple environments. As businesses navigate the complexities of the digital age, the value of this combined solution will only become more apparent, making it a vital component of any robust data security strategy.
ALTR's Format-preserving Encryption is now available on Snowflake Marketplace.
Browse All
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Oct 26
0
min
Modern Data Ecosystem Tools - A Complete Guide
ALTR Blog
What is the Modern Data Ecosystem?
Today’s business environment is awash with data. From product development intellectual property (IP) to customer personally identifiable information (PII) to logistics and supply chain information, data is coming at us from all directions. And that data is making its way throughout the business in ways that it never did before.
In the past, your customer and prospect data may have stayed securely behind a firewall in a customer database in a company-owned datacenter. But from the moment Salesforce launched its pioneering Software-as-a-Service CRM, that data has been moving into the cloud. And the volume has only increased. Now, cloud data platforms like Snowflake and Amazon Redshift offer anyone the ability to host and analyze data with just a credit card and a spreadsheet. This has opened a pandora’s box of data analysis possibilities that comes with attendant challenges and risks.
By now most companies understand the significant opportunities presented by living in the “Age of Data.” Recently, a data ecosystem of technologies has developed to help organizations take advantage of these new opportunities. In fact, so many new tools, solutions and technologies have appeared that choosing solutions for a modern data ecosystem can be almost as difficult as dealing with data itself.
We put together this guide to help clear the clutter and explain who does what in the modern data ecosystem and how it can help your organization become more data-driven more quickly.
Your Data Ecosystem Guide
Data Discovery, Classification, and Catalogs
The rapid growth of data collection, security threats, and regulatory requirements has transformed what was previously an esoteric process conversation into a mainstream business challenge. It’s now a strategic priority for any organization to apply and enforce data governance standards, not just the traditional regulated industries like finance and healthcare. However, data owners must tread carefully to avoid running up against privacy laws like GDPR and CCPA: Gartner believes that modern privacy regulations will cover 75% of the world in a couple of years.
Many vendors focus on “knowing” your data—where it is (discovery), what is it (classification), where it came from (data lineage). Industry analysts call this “metadata management,” or getting a handle on the data itself. Data discovery, classification and cataloging are the critical first steps of a big data ecosystem.
Alation
Alation is credited with creating the data catalog product category – an early building block of the modern data ecosystem. Its signature software, the Alation Data Catalog, serves enterprises in organizing and consolidating their data. Alation’s enterprise data catalog dramatically improves the productivity of analysts, increases the accuracy of analytics, and drives confident data-driven decision making while empowering everyone in your organization to find, understand, and govern data.
BigID
BigID offers software for managing sensitive and private data, completely rethinking data discovery and intelligence for the privacy era. BigID was the first company to deliver enterprises the technology to know their data to the level of detail, context and coverage they would need to meet core data privacy protection requirements. BigID’s data intelligence platform enables organizations to take action for privacy, protection, and perspective. Organizations can deploy BigID to proactively discover, manage, protect, and get more value from their regulated, sensitive, and personal data across their data landscape.
Collibra
Collibra calls itself “The Data Intelligence Company.” They aim to remove the complexity of data management to give you the perfect balance between powerful analytics and ease of use. The company’s premier offering is its data catalog – a single solution for teams to easily discover and access reliable data. It allows companies to provide users access to trusted data across all your data sources. Delivering this end-to-end visibility starts with your data catalog, and Collibra gets you up and running in days. With Collibra’s scalable platform, you can future-proof your investment, no matter where business takes you next.
Cloud Data Warehouses
While the cloud migration started with specific workloads moving to SaaS services (think Salesforce or Office 365), today the data ecosystem is focused on, well, data. The same advantages of SaaS – low up-front costs, no hardware to maintain, no datacenter to staff and service, no upgrades to track – all apply to the modern cloud data warehouse. In addition, data storage combined with compute enables companies to consolidate data from across the company and make it easily available for analysis and insight. Data-driven companies find this service invaluable.
Snowflake Data Cloud
Snowflake offers a cloud-based data storage and analytics service that allows users to store and analyze data using cloud-based hardware and software. Snowflake’s founders engineered Snowflake to power the Data Cloud, where thousands of organizations have smooth access to explore, share, and unlock the full value of their data. Today, 1300 Snowflake customers have more than 250PB of data managed by the Data Cloud, with more than 515 million data workloads that run each day.
Amazon Redshift
According to the company, tens of thousands of companies rely on Amazon Redshift to analyze exabytes of data with complex analytical queries, making it the most widely used cloud data warehouse. Users can run and scale analytics in seconds on all their data without having to manage a data warehouse infrastructure. Amazon Redshift uses SQL to analyze structured and semi-structured data across data warehouses, operational databases, and data lakes. With AWS-designed hardware and machine learning, the service can deliver the best price performance at any scale. The company also offers a Free Tier.
Databricks
The Databricks Lakehouse Platform combines the best elements of data lakes and data warehouses to deliver the reliability, strong governance and performance of data warehouses with the openness, flexibility and machine learning support of data lakes.
This unified approach simplifies your modern data stack by eliminating the data silos that traditionally separate and complicate data engineering, analytics, BI, data science and machine learning. It’s built on open source and open standards to maximize flexibility. And, its common approach to data management, security and governance helps you operate more efficiently and innovate faster.
ETL and ELT Providers
Another significant piece of the data ecosystem puzzle are ETL and ELT providers. Consolidating business data in cloud data warehouses like Snowflake is a smart move that can open up new doors of innovation and value. All your data in one place makes it easier to connect the dots in ways that were impossible or unimaginable before. For instance, a retail chain can optimize sales projections by analyzing weather patterns, or a logistics company can more accurately predict costs by accounting for the salaries of all the people involved in a shipment.
Getting to those insights is a process that starts with moving the data. An extract, transform, and load (ETL) migration technology partner simplifies moving or loading the data from each of your company’s locations into a cloud data warehouse to make it analytics-ready in no time. Moving data is what these companies do best.
Matillion
Matillion’s complete data integration and transformation solution is purpose-built for the cloud and cloud data warehouses. The company’s flagship tool, Matillion ETL, is specifically for cloud database platforms including Amazon Redshift, Google BigQuery, Snowflake and Azure Synapse. It is a modern, browser-based UI, with powerful, push-down ETL/ELT functionality. Matillion ETL pushes down data transformations to your data warehouse and process millions of rows in seconds, with real-time feedback. The browser-based environment includes collaboration, version control, full-featured graphical job development, and more than 20 data read, write, join, and transform components. Users can launch and be developing ETL jobs within minutes. Matillion offers a free trial.
Fivetran
Focused on automated data integration, Fivetran delivers ready-to-use connectors that automatically adapt as schemas and APIs change, ensuring consistent, reliable access to data. In fact, the company says it offers the industry’s best selection of fully managed connectors. Their pipelines automatically and continuously update, freeing users up to focus on game-changing insights instead of ETL. They improve the accuracy of data-driven decisions by continuously synchronizing data from source applications to any destination, allowing analysts to work with the freshest possible data. To accelerate analytics, Fivetran automates in-warehouse transformations and programmatically manages ready-to-query schemas. Fivetran offers a free trial.
Talend
According to Talend integrating your data doesn't have to be complicated or expensive. Talend Cloud Integration Platform simplifies your ETL or ELT process, so your team can focus on other priorities. With over 900 components, you can move data from virtually any source to your data warehouse more quickly and efficiently than by hand-coding alone. Talent helps reduce spend, accelerate time to value, and deliver data you can trust.
You can download a free trial of Talend Cloud Integration.
Business Intelligence (BI) and Analytics Tools
Most business data users aren’t running database queries but accessing data and gaining insights via business intelligence tools (BI) that provide services including reporting, online analytical processing, analytics, dashboard , data mining, complex event processing, business performance management, benchmarking, text mining, predictive analytics, and prescriptive analytics. As the front door to data for technical and line-of-business users throughout the company, finding a friendly, flexible, accessible BI solution is key.
Tableau
Tableau is an interactive data visualization software company focused on business intelligence. Tableau products query relational databases, online analytical processing cubes, cloud databases, and spreadsheets to generate graph-type data visualizations. The software can also extract, store, and retrieve data from an in-memory data engine. Tableau allows organizations to ensure the responsible use of data and drive better business outcomes with fully-integrated data management and governance, visual analytics and data storytelling, and collaboration—all with Salesforce’s industry-leading Einstein built right in. Companies can lower the barrier to entry for users to engage and interact by building visualizations with drag and drop, employing AI-driven statistical modeling with a few clicks, and asking questions using natural language. Tableau provides efficiencies of scale to streamline governance, security, compliance, maintenance, and support with solutions for the entire lifecycle as the trusted environment for your data and analytics—from connection, preparation, and exploration to insights, decision-making, and action.
ThoughtSpot
ThoughtSpot believes the world would be a better place if everyone had quicker, easier access to facts. Their search and AI-driven analytics platform makes it simple for anyone across the organization to ask and answer questions with data. It empowers colleagues, partners, and customers to turn data into actionable insights via the ThoughtSpot application, embedding insights into apps like Salesforce and Slack, or building entirely new data products. The consumer-grade search and AI technology delivers true self-service analytics that anyone can use, while the developer-friendly platform ThoughtSpot Everywhere makes it easy to build interactive data apps that integrate with users’ existing cloud ecosystem.
Looker
Looker Data & Analytics is business intelligence software and big data analytics platform that helps users explore, analyze and share real-time business analytics easily. Now part of Google Cloud, it offers a wide variety of tools for relational database work, business intelligence, and other related services. Looker utilizes a simple modeling language called LookML that lets data teams define the relationships in their database so business users can explore, save, and download data with only a basic understanding of SQL.[2] The product was the first commercially available business intelligence platform built for and aimed at scalable or massively parallel relational database management systems like Amazon Redshift, Google BigQuery and more.
Data Access Control and Data Security
ALTR is the only automated data access control and security solution that allows organizations to easily govern and protect sensitive data – enabling users to distribute more data to more end users more securely, more quickly. Hundreds of companies and thousands of users leverage ALTR’s platform to gain unparalleled visibility into data usage, automate data access controls and policy enforcement, and secure data with patented rate-limiting and tokenization-as-a-service. ALTR’s partner data ecosystem integrations with data catalogs, ETL, cloud data warehouses and BI services enable scalable on-premises-to-cloud protection. Our free integration with Snowflake allows admins to get started in minutes instead of months and scale up as you expand your data use, user base and databases.
The Evolving Data Ecosystem
ALTR continues to develop relationships with cloud data leaders across the industry. Our goal is to help our customers to get the most from their data by enabling a secure cloud data ecosystem that allows users to safely share and analyze sensitive data. Our scalable cloud platform acts as the foundation by enabling seamless integration with a wide variety of enterprise tools used to ingest, transform, store, govern, secure, and analyze data. ALTR has expanded how we interact with data ecosystem leaders via open-source integrations that allow users to freely and easily extend ALTR's data control and security to data catalogs like Alation and ETL tools like Matillion. Building a modern data ecosystem stack will set you firmly on the path to secure data-driven leadership.
Oct 10
0
min
CTO James Beecham Promoted to ALTR Chief Executive Officer
ALTR Blog
If we’ve learned anything over the last few years, it’s that this data space moves faster than you can imagine. Whether it’s new investments from market leaders, new acquisitions, new partnerships, or new technologies, the landscape is always changing, and those who aren’t ready for the next big shift are quickly left behind.
We anticipated this when we built the ALTR platform from the cloud up to be highly adaptable – our solution can easily scale up or scale down with users, with data, with cloud data warehouse usage. While our competitors were offering legacy on-prem solutions with high barriers to entry like long term commitments, massive up-front costs and complicated implementations, ALTR built a cloud-native, SaaS-based integration for Snowflake that users could add directly from Snowflake Partner Connect and a free plan that lets companies try our solution before ever paying a cent. Our decisions have paid off in market response, demonstrated by compounded annual revenue growth of over 300% since 2018 and an accelerating customer base of over 200 companies.
We couldn’t be more ready for the next phase in ALTR’s journey and it’s the perfect time to appoint a new leader to take it on: James Beecham, ALTR’s Co-founder and Chief Technology Officer has been promoted to become ALTR’s next Chief Executive Officer. As a Co-founder, James was key to identifying the data security hole ALTR could fill. As CTO, he has been the technical leader who envisioned how ALTR could best meet our customers’ needs and one of the most public faces of the company.
James is excited to chart the course for ALTR’s future, maintaining the company’s trajectory by ensuring we continue to anticipate, act proactively, and deliver the disruptive data governance and security solutions our customers and the market didn’t even realize were possible. We to believe that ALTR’s short “time-to-value" in a market that is fraught with complexity will deliver sustaining differentiation in the coming years.
And we’re a team here at ALTR so Dave isn’t going anywhere. He and James will work closely together during a transition period, and he will remain involved as a Board Director, CEO Advisor and ongoing financial Investor. Dave will also use this opportunity to expand his strategic advisory practice, mentor up-and-coming CEOs and explore other Board of Director opportunities.
Please don’t hesitate reach out to James, Dave or your Account Executive if you have any questions about the transition. And stay tuned for great things ahead…
- Dave & James
Oct 5
0
min
Data Mesh Has Ensnared the Data Ecosystem (and That’s Not a Bad Thing)
ALTR Blog
If there’s one phrase we heard over and over again at Snowflake Summit 2022 (other than “data governance”) it was "data mesh." What is data mesh, you ask? Good question!
Data Mesh definition
Data mesh is a decentralized data architecture to make data available through distributed ownership of data. Various teams own, manage and share data as a product or service they offer to other groups inside the company or without. The idea is that distributing ownership of data (versus centralizing it in a data warehouse or data lake with a single owner, for example) makes it more easily accessible to those who need it, regardless of where the data is stored.
You can imagine why this might be a hot topic in the data ecosystem. Companies are constantly looking for ways to make more data available to more users more quickly. The data mesh conversation has continued in data ecosystem leader blogs we’ve gathered in our Q3 roundup.
Alation: Fabrics, Meshes & Stacks, oh my! Q&A with Sanjeev Mohan
VP Product Marketing and Analyst Relations at Alation, Mitesh Shah, interviews former Gartner Analyst Sanjeev Mohan in this Q&A-style blog. Mohan shares his definitions of data mesh, data fabric and the modern data stack and why they’re such hot topics at the moment. Mohan suggests the possibility that new terms (like data mesh) are actually history repeating itself, dives into what these new strategies and architectures bring to the table for data-first companies and identifies the pros and cons of centralizing or decentralizing data and metadata.
Collibra: Data Observability Amidst Data Mesh
Eric Gerstner, Data Quality Principal, Collibra leverages his background as a former Chief Product Owner managing technology for digital transformation to dive into the data mesh concept. He explains that “No amount of technology can solve for good programmatics around the people and process.” He sees data mesh as a conceptual way of tying technology to people and processes and enabling an organization to improve its data governance. This article helps to shed light on the narrative of data mesh and how it fits into modern data organizations in both the immediate and further-out futures. He sees data mesh as key to linking people and processes – people that know how to interpret and organize data and the processes that drive and collect data into the organization itself.
Matillion: Data Mesh with Matillion
This blog by Matillion really unpacks the concept of data mesh at a fundamental level. It’s really about bringing data out from its usual role as a supporting player and elevating it to a product in and of itself. It’s about “productizing” data and offering it to customers within and without the company. Customers have an expectation of the quality of the product and the service they are utilizing. A data mesh can help data owners meet those expectations. Furthermore, this blog explains the steps necessary to create a data mesh with Matillion. Matillion’s low-code/no-code platform is an ideal partner for individual data teams that include a mix of domain and technology expertise.
Data Mesh Architecture: ALTR's Take
We’re all about making data easier to access – for authorized people. As the data mesh architecture proliferates, companies need to ensure that all data owners across the company are enabled with the appropriate tools in place to keep their sensitive data from spreading recklessly – to meet both internal guidelines and government regulations on data privacy. A data mesh architecture really democratizes data ownership and access, and ALTR’s no-code, low up-front cost solution democratizes data governance to go hand in hand with it. Data owners from finance to operations to marketing do not need to know any code to implement data access controls on the sensitive data they’re responsible for.
Oct 12
0
min
Snowflake Data Governance: Which Solution is Best?
ALTR Blog
Snowflake harnesses the power of the cloud to help thousands of organizations explore, share, and unlock the actual value of their data. Whether your company has ten employees or 10,000, if you’re one of Snowflake’s 4,500 customers and counting, you’re either thrilled or overwhelmed by the cloud data warehouse’s combination of out-of-the-box functionality and powerful, flexible features.
Wherever you are in your journey, though, it’s never too early or too late to think about how you’re handling Snowflake data governance and security for sensitive data like PII/PHI/PCI.
When you look at the enterprise-level security and governance capabilities Snowflake offers natively within the platform, you may wonder why you need more (see the Bonus question for this answer). And the options for Snowflake Data Governance offered by partners may sound similar, making it a challenge to know what the differences are and what you need.
With that in mind, we’ve put together the critical questions you should ask when evaluating Snowflake Data Governance options. Going through this list should reveal the best next step for your company.
1. Is the Snowflake data governance solution easy to set up and maintain? Does it use Proxy, Fake SaaS or Real SaaS?
There are several ways vendors can enable their Snowflake data governance solutions. One approach is to utilize a proxy. While proxy solutions have some advantages, they come with serious issues that make them less than ideal for cloud-based Snowflake:
- Extra effort is required to make all applications go through the proxy, adding time, complexity, and costs to your implementation.
- Security holes are created when applications and users can bypass the proxy to get full access to data, increasing risk and surfacing compliance issues
- Platform changes may break the proxy without warning, adding unnecessary downtime and delays
- On-premises proxies require you to deploy, maintain, and scale more infrastructure than you would with a pure-SaaS Cloud-native solution
SaaS is a better option for Snowflake data governance, but some providers calling themselves “SaaS” are better defined as "Managed Services." In these “Fake SaaS” solutions, vendors spin up, support and update an individual version of the software just for you. This makes it more expensive to run and maintain than true SaaS, costing you more. They can also require long maintenance windows that make the service unavailable during updates.
A proper multi-tenant SaaS-based data governance solution built for the cloud - like ALTR’s - is easier to start and maintain with Snowflake. There’s no hardware deployment or maintenance downtime required, no hardware sitting between your users and the data, no risk of a platform change breaking your integration, and no difficulty scaling your Snowflake usage. Because it’s natively integrated, there are no privacy issues or security holes. A real SaaS-based solution will also have the credentials to back it up: PCI DSS Level 1, SOC 2 Type II certification, and support for HIPAA compliance.
2. Is the Snowflake data governance solution easy to use? Does it require code to implement and manage?
Snowflake provides the foundation with native data governance features like sensitive data discovery and classification, access control and history, masking, and more with every release. But for users to take advantage of these Snowflake data governance capabilities on their own, they must be able to write SQL. That can make the features difficult, time-consuming, and costly to implement and manage at scale because data governance administration is limited to DBAs and other developers who can code.
However, the groundwork Snowflake provides allows partners to create solutions that leverage that built-in functionality but deliver an easier-to-use experience. ALTR’s solution provides native cloud integration and a user interface that doesn’t require code to get started or manage. This means your Data Governance teams or even line of business data or analytics users can take over the management of governance policies on Snowflake, freeing DBAs to focus on managing data streams and enabling data-driven insights.
3. It is a complete Snowflake data governance solution? Does it secure all of your data and reduce your risk?
This is crucial. You may look for a Snowflake Data Governance solution in response to privacy regulations, but you’ll never be truly compliant without a data security. And most "data governance" options don’t include data protection. While Snowflake offers many enterprise-level security features, there’s no defense against credentialed or privileged access threats. Once someone gets access with compromised credentials, there’s no mechanism for slowing or stopping data consumption.
Some software vendors calling themselves “data governance” only provide data discovery and classification – a data card catalog – without access control. And some other vendors require the data you want to protect to be copied into a new Snowflake database managed by the solution, leaving the raw data in the original database—ungoverned and unprotected. You may never know if anyone has accessed that data, potentially violating privacy regulations that require you to understand and document who has accessed data, even if nothing leaks outside the company.
For complete Snowflake Data Governance, you must not only be able to find and classify your data, but see data access, utilize consumption thresholds to detect anomalies and alert on them, respond to threats with real-time blocking, and tokenize critical data at rest. ALTR combines all these features into a single data governance and security platform that allows you to protect data appropriately based on data governance policies, ensure all your data is secure, and minimize your risk of data loss or theft.
4. Is the data governance solution affordable and flexible? Can you start with only what you need?
Most solutions cost $100k to $250k per year to start! These large, legacy on-premises platforms were not built for today’s scalable cloud environment. They require considerable time, resources, and money to even get started, which is an odd fit for Snowflake’s cloud-based platform, where Snowflake On-Demand gives you usage-based, per-second pricing with a month-to-month contract.
ALTR’s pricing starts at “free.” Our Free plan gives you the power to understand how your data is used, add controls around access, and limit your data risk at no cost. Our Enterprise and Enterprise Plus plans are available if you need more advanced governance controls, integration with SOAR or SIEM platforms, or increased data protection and dedicated support.
ALTR’s tiered pricing means there’s no large up-front commitment—you can start Snowflake data governance for free and expand if or when your needs change. Or stay on our free plan forever.
Bonus Question: Can't I just build a solution myself?
While a data admin can write Snowflake masking policy using SQL to leverage Snowflake's native features, what happens next? That is a one-time point fix but what about the long term and wide scale? Can others read and work with it? Do you have a QA team to eliminate errors? Can you ensure it scales correctly and can run quickly across thousands of databases? Do you have the time to integrate it with Okta or Matillion or Splunk? Do you have a roadmap that ensures it stays up to date with new private-preview Snowflake features, keeping up with your changing data and regulatory landscape, and addressing new user service needs? Basically, do you want your data team to be a software development team? You could hire 30 engineers and spend millions of dollars to build enterprise-ready Snowflake data governance software you can trust with the risky connection between users and data, but why should you when there are already cost-effective solutions from companies in the market focused on just this?
Conclusion
Companies flocking to the cloud data party, and Snowflake in particular, are faced with a dizzying array of options for Snowflake Data Governance. However similar the solutions may seem, with a little digging fundamental differences become apparent. ALTR’s solution stands out for its accessible, SaaS-based, no-code setup and management and complete Snowflake data governance and security feature set. And with its reasonable user- and data-based costs, ALTR becomes the obvious next step for Snowflake users to govern and protect their sensitive data.
Sep 28
0
min
What is Cloud Data Security? Your Complete Guide
ALTR Blog
What is Cloud Data Security? A Definition
Why is everyone talking about cloud data security today? The first wave of digital transformation focused on moving software workloads to SaaS-based applications in the cloud that were easy to spin up, required no new hardware or maintenance, and started with low costs that scaled with use. Today, the next generation of digital transformation is focused on moving the data itself — not just from on-premises data warehouses to the cloud but from other cloud-based applications and services into a central cloud data warehouse (CDW) like Snowflake. This consolidates valuable and often very sensitive data into a single repository with the goal of creating a single source of truth for the organization.
Cloud data security is focused on protecting that sensitive data, regardless of where it’s located, where it’s shared or how it’s used. It uses role-based data access controls, privacy safeguards, encrypted cloud storage, and data tokenization among other tools to limit the data users can access in order to meet data security requirements, comply with privacy regulations and ensure data is secure.
3 Benefits of Cloud Data Security
Cloud data security confers powerful benefits with almost no downsides. In fact, the biggest risk of cloud data security is not doing it.
- Improve business insights: When applied correctly, cloud data security enables data to be distributed securely throughout an organization, across business units and functional groups, without fear that data could be lost or stolen. That means you can share sensitive PII about customers with your finance teams, your marketing teams and even your sales teams, without worry that the data might make its way outside the company. You can gather information from various in-house and in-cloud business tools such as Salesforce or another CRM, your ERP, or your marketing automation solution into one centralized database where users can cross check and cross reference information across various data sources to uncover surprising insights.
- Avoid regulatory fines: It’s not just credit card numbers or health information that companies need to worry about anymore – today, practically every company deals with sensitive, regulated data. Personally Identifiable Information (PII) is data that can be used to identify an individual such as Social Security number, date of birth or even home address. It’s regulated by GDPR in Europe and by various state regulations in the US. Although the regulatory landscape is still patchy in the US, all signs point to a federal level statute or new regulation that will lay out rules for companies across the country coming very soon. For companies that want to get ahead of the issue, making sure their cloud data security meets the most stringent requirements is the easiest path. This can help a company ensure its meeting its obligations and reduce risk of fines from any regulation.
- Cultivate customer relationships: In a 2019 Pew Research Center study 81% of Americans said that the risks of data collection by companies can outweigh the benefits. This might be because 72% say they benefit very little or not at all from the data companies gather about them. A McKinsey survey showed that consumers are more likely to trust companies that only ask for information relevant to the transaction and react quickly to hacks and breaches or actively disclose incidents. These also happen to be some of the requirements of data privacy regulations – only gather the information you need and be upfront, timely and transparent about leaks. Companies can’t continue to gather data at will with no consequences – customers are awake to the risks now and demanding more accountability. This gives organizations a chance to strengthen the relationship with their customers by meeting and exceeding their expectations around privacy. If personalization creates a bond with customers, imagine how much more powerful that would be if buyers also trust you. Organizations that focus on protecting customer data privacy via a future-focused data governance program have an opportunity to take the lead in the market.
Cloud Data Security Challenges
Although cloud data security is a new area of concern, many of the biggest challenges are already well known by companies focused on keeping data safe.
- Securing data in infrastructure your company doesn’t own: With so much data moving to the cloud, yesterday’s perimeter is an illusion. If you can’t lock data down behind a firewall, and guess what, you can’t, then you’re forced to trust your cloud data warehouse. These facilities are extremely secure, but they only cover part of your security needs. They don’t manage or control user data access – that’s left to you. Bad actors don’t care where the data is – in fact, cloud data warehouses that consolidate data from multiple sources into a single store make a compelling target. Regulators don’t care where data is either when it comes to responsibility for keeping it safe: it’s on the company who collects it. Larger companies in more regulated industries face very punitive fines if there’s a leak—which can lead to severe consequences for the business.
- Securing data your team doesn’t own: From a security perspective, it’s difficult to protect data if you don’t know what it is or where it is. With various functional groups across companies making the leap to cloud data warehouses on their own in order to gain business insights, it’s difficult for the responsible groups such as security teams to be sure data is safe.
- Stopping privileged access threats: When sensitive data is loaded to a CDW there’s often one person who doesn’t really need access, but still has it: your Snowflake admin. If your company is like Redwood Logistics, uploading sensitive financial data in order to better estimate costs, you really don’t want your admin to have access – and usually, he doesn’t either! Even if you trust your admin and you probably do, there’s no guarantee his credentials won’t get stolen and no upside to him or the business to allowing that access. This leads into our next challenge:
- Stopping credentialed access threats: Even the most trustworthy employees can be phished, socially engineered or plain have their credentials stolen. Despite the training companies have done to educate users about these risks, the credentialed access threat continues to be one of the top sources of breach in the Verizon Data Breach Investigations Report, for the sixth year in a row! ALTR’s James Beecham asks year after year: “Why Haven’t We Stopped Credentialed Access Threats?” We know how – even when humans are fallible there is technology that can help.
- Using data safely in Business Intelligence tools: One of the key goals to consolidating data into a centralized CDW is to enable business intelligence access. BI tools like Tableau, ThoughtSpot and Lookr depend on access to all available data in order to provide a full 360 view of the business. When the data can’t be utilized securely in these tools, it often results in security admins making the call to leave that data out of the equation, creating a broken view of the business.
Cloud Data Security Best Practices
There are a few best practices every organization should incorporate into their successful cloud data security program:
1. Keep your eye on the data - wherever it is
This shift to the cloud really requires a shift in the security mindset: from perimeter-centric to data-centric security. It means CISOs (Chief Information Security Officer) and security teams will have to stop thinking about hardware, data centers, firewalls, and instead focus on the end goal: protecting the data itself. Responsible teams need to embrace data governance and security policies around data throughout the organization and its data ecosystem. They need to understand who should have access to the data, understand how data is used, and place relevant controls and protections around data access. In fact they could start with a data observability program in order to understand what normal data usage looks like so they're better able to identify abnormal.
2. Empower everyone to secure cloud data
We often hear “security is everyone’s responsibility.” But how could it be when most are left out of the process? While data is a key vulnerability for essentially every company, until recently most companies didn’t want to acknowledge the risk. Now, with a new data breach announcement every few weeks, the problem is impossible to ignore. When marketing teams are using shadow cloud data warehouse resources instead of waiting for security or IT teams to vet the solution for security requirements, it’s easier to make sure data owners have the means to protect the data themselves. Instead of governance technologies based on legacy infrastructure that not only require big investments in time, money, and human resources to implement, but also expensive developers to set up and maintain, democratize data governance with tools that allow non-coders to rollout and manage the data security solution themselves in weeks or even days.
3. Add cloud data security checks and balances to your cloud data warehouse
To protect data (and your Database Administrator!) from the risk of sensitive data, put a neutral third party in place that can keep an eye on data access - natively integrated into to the cloud data platform yet outside the control of the platform admin. This separation of duties should make it impossible to access the data without key people being notified and can limit the amount of data revealed, even to admins. It can include features like real time alerts that notify relevant stakeholders at the company whenever the admin (or any user for that matter) tries to access the data. If none of the allowed users accessed the data, they’ll know unauthorized access has occurred within seconds. Alert formats can include text message, Slack or Teams notifications, emails, phone calls, SIEM integrations, etc. Data access rate limits that constrain the amount of de-tokenized data delivered to any user, including admins, also limit risk. While a user can request 10 million records, they may only get back 10,000 or 10 per hour. This can also trigger an alert to relevant stakeholders. These features ensure that no single user has the keys to the entire data store – no matter who they are.
4. Always assume credentials are compromised and cloud data is at risk
Knowing that the easiest and best ways to stop credentialed access threats are undermined by people being people, we’re simply better off assuming all credentials are compromised. Stolen credentials are the most dangerous if, once an account gets through the front door, it has access to the entire house including the kitchen sink. Instead of treating the network as having one front door, with one lock, require authorization to enter each room. This is actually Forrester’s “Zero Trust” security model – no single log in or identity or device is trusted enough to be given unlimited access. This is especially important as more data moves outside the traditional corporate security perimeter and into the cloud, where anyone with the right username and password can log in. While cloud vendors do deliver enterprise class security against cyber threats, credentialed access is their biggest weakness. It’s nearly impossible for a SaaS-hosted database to know if an authorized user should really have access or not. Identity access and data control are still up to the companies utilizing the cloud platform.
Key Components of Cloud Data Security Solutions
An effective cloud data security solution includes these key components:
- Knowing where your data is and categorizing what data is sensitive: With data often spread throughout an organization’s technology stack, it can be challenging to even know all the various places sensitive data like social security numbers are stored. Solving this issue often starts with a data discovery and data classification solution that can find data across stores, group information into types of data and apply appropriate tags.
- Controlling access to sensitive data: In today’s data-driven enterprises, data is not just used by data scientists. Everyone from marketing to sales to product teams may need or want access to sensitive data in order to make more informed business decisions but not everyone will be authorized to have access to all the data. Making sure you have the ability to grant access to some users but not others, or allow access to some roles but not others, in an efficient, scalable and secure way is one of the most important components of cloud data security.
- Putting extra limits on sensitive data access: Data security doesn’t have to be either/or. With data access rate limits, users can be prohibited from gaining access to more data than they should reasonably need. This can stop bad actors with credentials from downloading the whole database by setting rate limits per user or per time period, ex: 10,000 records in 1 hour vs 1M.
- Securing sensitive data with encryption or tokenization: Encryption is one cloud data security approach that is highly recommended by security professionals. However, it does have weaknesses and limitations when it comes to utilizing data in the cloud. Tokenization can enable data to be stored securely yet still be available for analysis.
Conclusion
There’s no chance of reversing the migration of data to the cloud and why would we want to? The benefits are so staggering, it’s well worth any challenges presented. As long as cloud data security is built in as a priority from the start, risks can be mitigated, and the full power and possibility of a consolidated Cloud Data Warehouse can come to fruition.
See how ALTR can help automate and scale your cloud data security in Snowflake. Get a demo!
Sep 7
0
min
What Attributes Should a Modern Data Team Have?
ALTR Blog
The road to becoming one of today’s data-driven companies is full of challenges, not the least of which is finding and keeping the right people with the right skills to get you there. And it’s not always about the individual, it’s also about the team and finding the right combined characteristics that lead to success.
As part of our Expert Panel Series, we asked some experts in the modern data ecosystem what attributes a modern data team should have. Here’s what we heard…
John DaCosta, Sr. Sales Engineer, Snowflake:
“I have been referencing this McKinsey article for years now. ‘A two-speed IT architecture for the digital enterprise.’ The concept is an Enterprise IT organization (Data Platform / Networking, etc.) that manages mature assets / processes. The 2nd speed teams are smaller, agile and more focused. They focus on "shadow it,” for example: Marketing Analytics / Marketing Technology. They are allowed to do whatever they need to get the job done. But once things are mature, they can be transitioned into Enterprise IT. In my interpretation, functional areas have their own ‘smaller technology teams’ that have all the required skill sets to deliver on projects for the business unit sponsoring it.”
Phil Warner, Director of Data Engineering, PandaDoc:
“Hire T-shaped skillsets and people who are happy to collaborate. Nothing is worse than a team of siloed individual contributors. [People with T-shaped skills] are team members who specialize in a particular area (such as Python, or data modeling, or infrastructure, etc.), but also have all-round skills, to a lesser degree, across the board. This allows for broad coverage across the team, without having to train everyone on everything to the same level, and also gives you team members that'll never say 'that's not my job', or sit there and pout when a particular ETL process didn't get written in Python this time around. They also tend to be inquisitive and curious by nature, and so are open to new ways of doing things and new technologies to move things forward, rather than painting themselves into a box and refusing to do anything other than what they know.
The opposite of a person with a T-shaped skillset is a one-trick pony. 😁”
Louis Hassel, Account Executive, Alation:
“A modern team should have a variety of skills but the best attribute they can have is a shared vision of the overall goal of the data project. If the Marketing manager needs hourly reports and the data engineering team is building daily extracts there is a disconnect. The data exec level would be great, but not a necessity. Just need to do a little planning to succeed rather than rebuild everything.”
James Beecham, Founder & CTO, ALTR:
“Similar to a full-stack developer, or a ‘feature team’ for software development, having team members that are cross functional is key to accelerating your data initiatives. I have seen too many projects stall because one person says, ‘I don’t know anything about the data pipeline, so I cannot tell you the answer’ or ‘I don’t have access to the data so I cannot verify that classification report.’ These types of bottlenecks always pop up at the worst time and cause delays. Cross training team members, having folks who are not afraid of using every tool you have in your stack is critical to your success.”
Watch out for the next monthly installment of our Expert Panel Series on LinkedIn!
Sep 21
0
min
What is Data Tokenization – A Complete Guide
ALTR Blog
What is Data Tokenization? – a Definition
You may be familiar with the idea of encryption to protect sensitive data, but maybe the idea of tokenization is new. What is data tokenization? In the realm of data security, “tokenization” is the practice of replacing a piece of sensitive or regulated data (like PII or a credit card number) with a non-sensitive counterpart, called a token, that has no inherent value. The token maps back to the sensitive data through an external data tokenization system. Data can be tokenized and de-tokenized as often as needed with approved access to the tokenization system.
How Does Tokenization of Data Work?
Original data is mapped to a token using methods that make the token impractical or impossible to restore without access to the data tokenization system. Since there is no relationship between the original data and the token, there is no standard key that can unlock or reverse lists of tokenized data. The only way to undo tokenization of data is via the system that tokenized it. This requires the tokenization system to be secured and validated using the highest security levels for sensitive data protection, secure storage, audit, authentication and authorization. The tokenization system is the only vehicle for providing data processing applications with the authority and interfaces to request tokens or de-tokenize to the original sensitive data.
Replacing original data with tokens in data processing systems and applications like business intelligence tools minimizes exposure of sensitive data in those applications, stores, people and processes, reducing risk of compromise, breach or unauthorized access to sensitive or regulated data. Applications, except for a handful of necessary applications or users authorized to de-tokenize when strictly necessary for a required business purpose, can operate using tokens instead of live data,. Data tokenization systems may be operated within a secure isolated segment of the in-house data center, or as a service from a secure service provider.
What is Data Tokenization Used For?
Tokenization may be used to safeguard sensitive data including bank accounts, financial statements, medical records, criminal records, driver's licenses, loan applications, stock trades, voter registrations, and other types of personally identifiable information (PII). Data tokenization is most often used in credit card processing, and the PCI Council defines tokenization as "a process by which the primary account number (PAN) is replaced with a surrogate value called a token. De-tokenization is the reverse process of redeeming a token for its associated PAN value. The security of an individual token relies predominantly on the infeasibility of determining the original PAN knowing only the surrogate value".
The choice of tokenization as an alternative to other data security techniques such as encryption, anonymization, or hashing will depend on varying regulatory requirements, interpretation, and acceptance by auditing or assessment entities. We cover the advantages and disadvantages of tokenization versus other data security solutions below.
Benefits of Data Tokenization
When it comes to solving these cloud migration challenges, tokenization of data has all the obfuscation benefits of encryption, hashing, and anonymization, while providing much greater usability. Let’s look at the advantages in more detail.
- No formula or key: Tokenization replaces plain-text data with an unrelated token that has no value if breached. There’s no mathematical formula or key; a token vault holds the real data secure.
- Acts just like real data: Users and applications can treat tokens the same as real data and perform high-level analysis it, without opening up the door to risk of leaks or loss. Anonymized data on the other hand provides only limited analytics capability because you’re working with ranges, while hashed and encrypted data are ineligible for analytics. With the right tokenization solution, you can share tokenized data from the data warehouse with any application, without requiring data to be unencrypted and inadvertently exposing it to users.
- Granular analytics: Retaining the connection to the original data enables you to dig deeper into the data with more granular analytics than anonymization. Anonymized data is limited by the original parameters, such as age ranges or broad locations, which might not provide enough details or flexibility for future purposes. With tokenized data, analysts can create fresh segments of data as needed, down to the original, individual street address, age or health information.
- Analytics plus protection: Tokenization delivers the advantages of analytics with the strong at-rest protection of encryption. For the strongest possible security, look for solutions that limit the amount of tokenized data that can be de-tokenization and also issue notifications and alerts when data is de-tokenized so you can ensure only approved users get the data.
Tokenization Vs. Encryption
1. Tokens have no mathematical relationship to the original data, which means unlike encrypted data, tokenized data can’t be broken or returned to their original form.
While many of us might think encryption is one of the strongest ways to protect stored data, it has a few weaknesses, including this big one: the encrypted information is simply a version of the original plain text data, scrambled by math. If a hacker gets their hands on a set of encrypted data and the key, they essentially have the source data. That means breaches of sensitive PII, even of encrypted data, require reporting under state data privacy laws. Tokenizing data, on the other hand, replaces the plain text data with a completely unrelated “token” that has no value if breached. Unlike encryption, there is no mathematical formula or “key” to unlocking the data – the real data remains secure in a token vault.
2. Tokens can be made to match the relationships and distinctness of the original data so that meta-analysis can be performed on tokenized data.
When one of the main goals of moving data to the cloud is to make it available for analytics, tokenizing the data delivers a distinct advantage: actions such as counts of new users, lookups of users in specific locations, and joins of data for the same user from multiple systems can be done on the secure, tokenized data. Analysts can gain insight and find high-level trends without requiring access to the plain text sensitive data. Standard encrypted data, on the other hand, must be decrypted to operate on, and once the data is decrypted there’s no guarantee it will be deleted and not be forgotten, unsecured, in the user’s download folder. As companies seek to comply with data privacy regulations, demonstrating to auditors that access to raw PII is as limited as possible is also a huge bonus. Data tokenization allows you to feed tokenized data directly from Snowflake into whatever application needs it, without requiring data to be unencrypted and potentially inadvertently exposed to privileged users.
3. Tokens maintain a connection to the original data, so analysis can be drilled down to the individual as needed.
Anonymized data is a security alternative that removes the personally identifiable information by grouping data into ranges. It can keep sensitive data safe while still allowing for high-level analysis. For example, you may group customers by age range or general location, removing the specific birth date or address. Analysts can derive some insights from this, but if they wish to change the cut or focus in, for example looking at users aged 20 to 25 versus 20 to 30, there’s no ability to do so. Anonymized data is limited by the original parameters which might not provide enough granularity or flexibility. And once the data has been analyzed, if a user wants to send a marketing offer to the group of customers, they can’t, because there’s no relationship to the original, individual PII.
Three Risk-based Models for Tokenizing Data in the Cloud
Depending on the sensitivity level of your data or comfort with risk there are several spots at which you could tokenize data on its journey to the cloud. We see three main models - the best choice for your company will depend on the risks you’re facing:
Level 1: Tokenize data before it goes into a cloud data warehouse
- The first issue might be that you’re consolidating sensitive data from multiple databases. While having that data in one place makes it easier for authorized users, it might also make it easier for unauthorized users! Moving from multiple source databases or applications with their own siloed and segmented security and log in requirements to one central repository gives bad actors, hackers or disgruntled employees just one location to sneak into to have access to all your sensitive data. It creates a much bigger target and bigger risk.
- And this leads to the second issue: as more and more data is stored in high-profile cloud data warehouses, they have become a bigger focus for bad actors and nation states. Why should they go after Salesforce or Workday or other discrete applications separately when all the same data can be found in one giant hoard?
- The third concern might be about privileged access from Snowflake employees or your own Snowflake admins who could, but really shouldn’t, have access to the sensitive data in your cloud data warehouse.
If your company is facing any of these situations, it makes sense for you to choose “Level 1 Tokenization”: tokenize data just before it goes into the cloud. By tokenizing data that is stored in the cloud data warehouse, you ensure that only the people you authorize have access to the original, sensitive data.
Level 2: Tokenize data before moving it through the ETL process
As you’re mapping out your path to the cloud, you may want to make sure data is protected as soon as it leaves the secure walls of your datacenter. This is especially challenging for CISOs who’ve spent years hardening the security of perimeter only to have control wrested away as sensitive data is moved to cloud data warehouses they don’t control. If you’re working with an outside ETL (extract, transform, load) provider to help you prepare, combine, and move your data, that will be the first step outside your perimeter you want to safeguard. Even though you hired them, without years of built-up trust, you may not want them to have access to sensitive data. Or it may even be out of your hands—you may have agreements or contracts with your own customers that specify you can’t let any vendor or other entity have access without written consent.
In this case, “Level 2 Tokenization” is probably the right choice. This takes one step back in the data transfer path and tokenizes sensitive data before it even reaches the ETL. Instead of direct connection to the source database, the ETL provider connects through the data tokenization software which returns tokens. ALTR partners with SaaS-based ETL providers like Matillion to make this seamless for data teams.
Level 3: End-to-end on-premises-to-cloud data tokenization
If you’re a very large financial institution classified as “critical vendor” by the US government, you’re familiar with the arduous security required. This includes ensuring that ultra-sensitive financial data is exceedingly secure – no unauthorized users, inside or outside the enterprise, can have access to that data, no matter where it is. You already have this nailed down in your on-premises data stores, but we’re living in the 21st century and everyone from marketing to IT operations is saying “you have to go to the cloud.” In this case, you’ll need “Level 3 Tokenization”: full end-to-end data tokenization from all your onsite databases through to your cloud data warehouse.
As you can imagine, this can be a complex task. It requires tokenization of data across multiple on-premises systems before even starting the data transfer journey. The upside is that it can also shine a light on who’s accessing your data, wherever it is. You’ll quickly hear from people throughout the company who relied on sensitive data to do their jobs when the next time they run a report all they get back is tokens. This turns into a benefit by stopping “dark access” to sensitive data.
Conclusion
Data tokenization can provide unique data security benefits across your entire path to the cloud. ALTR’s SaaS-based approach to data tokenization-as-a-service means we can cover data wherever it’s located: on-premises, in the cloud or even in other SaaS-based software. This also allows us to deliver innovations like new token formats or new security features more quickly, with no need for users to upgrade. Our tokenization solutions also range from flexible and scalable vaulted tokenization all the way up to PCI Level 1 compliant, allowing companies to choose the best balance of speed, security, and cost for their business. We’ve also invested heavily in IP that enables our database driver to connect transparently and keep data usable while tokenized. The drivers can, for example, perform the lookups and joins needed to keep applications that are unused to tokenization running.
With data tokenization from ALTR, users can bring sensitive data safely into the cloud to get full analytic value from it, while helping meet contractual security requirements or the steepest regulatory challenges.
Aug 24
0
min
Snowflake Masking Policy: DIY vs ALTR
ALTR Blog
There’s nothing worse than when you lose the remote to your TV. All you want to do is sit on the couch and change the channel or the volume at your leisure — but when you don’t have a remote you have to get up, walk over to the tv, click the “next channel” button twenty-five times until you get to the channel you want, then walk all the way back to the couch to sit down, exhausted. Oh, then you realize it’s too loud, and now you have to do the whole thing all over again. It’s downright infuriating.
But if you didn’t know that a remote existed, you probably wouldn’t mind it so much, right? If that’s all you ever had, it would seem normal. This is a good way to think about how ALTR works when it comes to Snowflake Masking Policy. You can do dynamic data masking in Snowflake without us, but it's a heck of a lot easier to do it with us.
What you do now: write your Snowflake masking policy using SnowSQL
Generally, writing a Snowflake masking policy requires roughly 40 lines of SnowSQL per policy. Depending on your business, that can turn into 4,000 lines real quick. And then you have to test to make sure it works as intended. And then you have to go through QA. And then you have to update it and start the process all over. The process can feel endless. Just like going from channel 12 to channel 209 without a remote, it’s exhausting and tedious.
If you look at Snowflake’s documentation, you’ll see that creating a Snowflake masking policy requires 5 steps:
1. Grant Snowflake masking policy privileges to custom role
2. Grant the custom role to a user
3. Create a Snowflake masking policy
4. Apply the Snowflake masking policy to a table or view column
5. Query data in Snowflake
That’s just to get started with a basic Snowflake Masking Policy! If you want to apply different types, like a partial mask, time stamp, UDF, etc. then you’ll need to refer back to the documentation again. To get more advanced with Snowflake tag-based or row-level policy, you’ll need another deep dive.
The big kicker here is the amount of time it takes to code not only the initial policies, but to update them and test them over time. No matter how good anyone is at SnowSQL, there’s always room for human error that can lead to frustration at best and at worst to dangerous levels of data access.
So, what if you could automate the Snowflake masking policy process? What if you could use a remote to do it for you to save time and keep things streamlined for your business?
What you could be doing: automating Snowflake masking policy with ALTR
Setting a sensitive data masking policy in ALTR is like clicking “2-0-9" on your remote when a commercial comes on channel 12; you log in, head to the Locks tab, and use ALTR’s interface to set a Snowflake masking policy that has already been tested for you. And when something changes in your org, you log back in and update your data masking policy or add a new one with just a few clicks.
Here’s exactly how that works:
1. Navigate to Data Policy --> Locks --> Add new
2. Fill out the lock details: name, user groups affected.
3. Choose which data to apply policy to, then choose the dynamic data masking type you’d like to use (full mask, email, show last four of SSN, no mask, or constant mask).
a. Column-based data masking (sensitive columns have been classified and added for ALTR to govern)
b. Tag-based data masking (tags are defined either by Google DLP, Snowflake classification, Snowflake object tags, or tags imported from a data catalog integration).
4. (Optional) Add another data masking policy.
5. Click “Add Lock”
That’s it; there’s no code required, and anyone in the business can set up a Snowflake masking policy if they have the right Snowflake permissions. To update or remove a lock, all you have to do is edit the existing policy using the same interface.
ALTR’s data masking in Snowflake policies are not only easy to implement, but they leverage Snowflake’s native capabilities, like Snowflake tag-based masking. That means that ALTR is not only the most cost-effective method, but it ensures that your policy works best with Snowflake.
Check out this video below to see what it looks like to set Snowflake Masking Policy manually versus doing it in ALTR:
Sep 14
0
min
Why You Need an “Okta” for Cloud Data Access
ALTR Blog
SaaS platforms have exploded in the last few years for good reason: they offer unprecedented scalability, cost, accessibility, flexibility. But like any explosion, it left some messes in its wake. For IT and security teams in particular, the increasing number of solutions used by teams throughout the company created a seemingly never-ending need to add users, remove users, or change permissions every time some joined, changed roles, shifted responsibilities, or left the company altogether. As is often the case, IT and security teams took up the slack managing and maintaining user permissions manually – going into each platform, adding each new user, setting permissions and doing it over and over again, each time a change occurred.
This led to delays, risk of error or even users skipping the authorization process altogether. According to Gigaom research, 81% of employees admitted to using unauthorized SaaS applications, and in an IDG report 73% of IT leaders agreed that keeping track of identity and permissions across environments is a primary challenge. If onboarding new employees was painful, off-boarding was even worse. If IT forgot a service, then a past employee could still have access they shouldn’t. Talk about a security issue!
Okta Automates User Account Management
Then in 2009, along came Okta. Built on top of the Amazon Web Services cloud, Okta’s single sign-on service allows users to log into multiple systems using one central process. Okta automatically creates all your user accounts when an employee comes on, then automatically disables or deactivates them when an employee leaves. You can still always go into each service and make changes, but why? Okta is SaaS-based, you can start for free and then it’s just a couple of dollars per user per month after that. Okta also expanded to integrate with other solutions to simplify the overall onboarding process. For example, using ServiceNow when a new employee is hired triggers the building manager to generate a new badge, Okta to generate user accounts, and HR to generate payroll forms.
At a certain point, it became stupid not to use Okta, and today the service has more than 300 million users and 15k+ customers. So that takes care of the first wave of cloud migration: users moving to SaaS platforms. But what about the next migration: data moving to cloud platforms?
Why Shouldn’t We Have Okta for Cloud Data Access Control?
If the Okta model worked for software permission provisioning, why couldn’t something similar be the answer for cloud data access control and security? Setting individual or role-based user data access policies correctly is critical, but perhaps even more critical is the confidence that access is revoked when needed – all automated, all error-free. In addition, Okta’s ease of use allowed it to be utilized by groups outside IT, like marketing and sales teams who were early SaaS adopters. Since data, just like software, is often owned, controlled and migrated by groups outside IT, shouldn’t managing data access and security be just as flexible and user-friendly?
From DIY Cloud Data Access Control to D-I...Why?
Okta’s (and many automated solutions’) biggest early competitor was “do-it-yourself.” If you’ve always been able to handle users and data access control manually, it can seem like making the shift to a new process would just add more work. But it’s a little like the frog in the pot – the temp is rising but you don’t realize you’re boiling until it’s too late. Maybe setting up a new data user took 10 minutes just a year ago, but today you’re dealing with hundreds of requests a week; and something that was a snap to do manually on a small scale is now taking up hours of your time. It’s when you realize that your data projects have moved from minimal viable product/beta stage to full production with hundreds of users across the enterprise, that you may wake up one day and realize you no longer have any time to enable data projects because you’re so busy enabling data users.
ALTR Automates Cloud Data Access Control
Okta is a low lift, SaaS-delivered, zero-up pricing solution that eliminates burdensome manual provisioning of user access to software and integrates with multiple systems to automate the onboarding process. Sound familiar? We believe that ALTR is the “Okta for data.” We massively simplify provisioning data access controls at scale and integrate with the top-to-bottom modern data stack to reduce error and risk and increase efficiency.
And if you don’t think you need it today, just look back at the journey from manual software permissions to Okta. It’s only a matter of time before data access follows the same path. Wouldn’t it be great to get out of the pot BEFORE it’s boiling?
See how easy and scalable automated data access control can be in Snowflake with ALTR. Try ALTR Free!
Aug 17
0
min
Tokenization vs Encryption: Which is Best for Cloud Data Security?
ALTR Blog
ALTR CEO James Beecham has compared encryption to duct tape. Duct tape is great - it comes in handy when you need a quick fix for a thousand different things or even...to seal a duct. But when it comes to security, you need powerful tools that are fit for purpose.
Today, let’s compare some different methods you could use to secure data - including tokenization vs encryption - to see which is the best fit for your cloud data security.
Tokenization vs Encryption: 3 Reasons to Choose Tokenization
As a data security company, ALTR uses encryption for some things, but when we looked at encryption vs tokenization, we found tokenization far superior for two key data security needs:
- Defeating data thieves
- Enabling data analysis
Companies that want to transform data into business value need both security and analytics. Tokenization delivers the best of both worlds: the strong at-rest protection of encryption and the analysis opportunity provided by similar solutions like anonymization.
3 ways tokenization is superior to encryption:
1. Tokenization is more secure.
It actually replaces the original data with a token, so if someone successfully obtains the digital token, they have nothing of value. There’s no key and no relationship to the original data. The actual data remains secure in a separate token vault.
This is important because we now collect all kinds of information as a society. Companies want to analyze the customer data they hold, whether it’s Netflix, a hospital or a bank. If you’re using encryption to protect the data, you must first decrypt it all to make any use of it or any sense of it. And decrypting leads to data risk.
2. Tokenization enables analytics.
Because tokenization offers determinism, which which maintains the same relationship between a token and the source data every time, accurate analytics can be performed on data in the cloud.
If you provide a particular set of inputs, you get the same outputs every time. Deterministic tokens represent a piece of data in an obfuscated way and give you back the same token or representation when you need it. The token can be a mashup of numbers, letters and symbols, just like an encrypted piece of data, but tokens preserve relationships. The real benefit of deterministic tokenization is allowing analysts to connect two datasets or databases securely, protecting PII privacy while allowing analysts to run their data operations.
3. Tokenization maintains the source data.
Because the connection is two way – tokenization and de-tokenization - you can retrieve the original data in the event if you need it.
Let’s say you’ve collected instrument readings from a personal medical device that I own. If you detect something in that data, like performance degradation, you and I both would appreciate my getting a phone call, an email or a letter informing me I need to replace the device. Encryption would not allow this because once data is encrypted, such as my name or phone number, it disappears forever from the database.
Tokenization vs Anonymization: Limited Analytics Today and Tomorrow
Unlike encryption, anonymization offers some ability to perform fundamental analysis, but is limited by the anonymization data design and intent. Anonymization removes all the PII by grouping data into ranges, like age range or zip code while removing their birthdate and street address. This means you can perform a level of analysis on anonymized data, say on your 18 to 25 years old customers. But what if you wanted a different group or associate that age range with another data set?
Anonymization is permanent and inflexible. The process cannot be reversed to re-identify individuals, which might not give you enough options. If your team wants to follow an initial data run to invite a group of customers to an event or send them an offer, you’re stuck without the phone number or mailing address available. There’s no relationship to the original PII of the individual.
Tokenization vs Hashing: A One-Way Trip
Another data security tool is one-way hashing. This is a form of cryptographic security that uses an algorithm to convert source data into an anonymized piece of data of a specific length. Unlike encryption, because the data is a fixed length and the same hash means the same data, it can be operated on with joins. But a big downside is that it’s (virtually) irreversible. So, like anonymization, once the data is converted, it cannot be turned back into plain text or source data for further analysis. Hashing is most often used to protect passwords stored in databases. You may also hear the term “salting” applied to password hashing. This is the practice of adding additional values to the end of the hashed password to differentiate the value, making the password cracking process much harder. Hashing works very well for password protection but is not ideal for PII that needs to be used.
Encryption, anonymization and one-way hashing, therefore, can be shortsighted moves. Your organization’s success depends on allowing authorized users to access the original data now and in the future, as long as you can track and report on the usage. At the same time, you must also ensure that sensitive data is useless to everyone else.
Tokenization: The Clear Cloud Data Security Winner
When looking at tokenization vs encryption, it's clear that tokenization overcomes the challenges other data security solutions face by preserving the connections and relationships between data columns and sets. However, tokenization isn’t just a simple mathematical scramble of the original data like encryption or a group of ranges with anonymized data. Authorized analysts can query tokenized data for insights without having access to the underlying PII. The more secure token remains meaningless to any unauthorized user or hacker.
With modern tokenization techniques, you can apply policies and authorize access at scale for thousands of users. You can also track and report on the secure access of sensitive data to ensure compliance with privacy regulations worldwide. You can’t do this with anonymization, hashing or encryption.
When it comes to tokenization vs encryption, tokenization is the more flexible tool for secure access and privacy compliance. This is critical for organizations quickly moving from storing gigabytes to petabytes of data in the cloud. You can feed tokenized data directly from cloud data warehouses like Snowflake into any application. You can do this with complete confidence that all the data, including sensitive PII, will be protected even from the database admin while making it easy for authorized data end-users to collaborate and deliver valuable insight quickly. Isn’t that the whole point?
See how ALTR can integrate with leading data catalog and ETL solutions to deliver automated tokenization from on-premises to the cloud. Get a demo.
Jul 27
0
min
What is Data Governance - a Complete Guide
ALTR Blog
Most of us know that data creation and collection has accelerated over the last few years. Along with that has come an increase in data privacy regulations and the prominence of the idea of “data governance” as something companies should be focused on and concerned with. Let’s see what’s driving the focus on data governance, define what “data governance” actually is, look at some of the challenges, and how companies can implement data governance best practices to build a modern enterprise data governance strategy.
Data Governance History
The financial services industry was one of the first to face regulations around data privacy. The Gramm–Leach–Bliley Act (GLBA) of 1996 requires all kinds of financial institutions to protect customer data and be transparent about data sharing of customer information. This was followed by the Payment Card Industry Data Security Standard (PCI DSS) in 2006. Then the Financial Industry Regulatory Authority (FINRA), founded in 2007, established rules institutions must follow to protect customer data from breach or theft.
Perhaps not surprisingly, healthcare was another industry to face early data regulations. The first sensitive data to be covered in the US was private health data – the Health Insurance Portability and Accountability Act of 1996 (HIPAA) required national standards to protect sensitive patient health information from being disclosed without the patient’s consent or knowledge. More recently, data privacy regulations like the European Union’s GDPR and California’s CCPA privacy regulation have expanded coverage to all variety of “personal data” or Personal Identifiable Information (PII). These laws put specific rules around what companies can do with sensitive personal data, how it must be tracked and protected. And US data privacy guidelines have not stopped there – Colorado, Connecticut, Virginia and Utah have all followed their own state-level privacy regulations. So today, just about every company deals with some form of sensitive or regulated data. Hence the search for data governance solutions that can help companies comply.
What is Data Governance? - a Definition
Google searches for “data governance” have doubled over the last five years, but what is "data governance” really? There are a few different definitions depending on where you look:
- The Data Governance Institute defines data governance as “a system of decision rights and accountabilities for information-related processes, executed according to agreed-upon models which describe who can take what actions with what information, and when, under what circumstances, using what methods.”
- The Data Management Association (DAMA) International says it is “planning, oversight, and control over the management of data and the use of data and data-related sources.”
- According to the Gartner Glossary, it’s “the specification of decision rights and accountability framework to ensure the appropriate behavior in the valuation, creation, consumption, and control of data and analytics.”
You could probably find a hundred more data governance definitions, but these are pretty representative. Interestingly, it’s either called a “system” or a “framework,” – which are very process-oriented terms.
At a high level, “data governance” is about understanding and managing your data. Enterprise data governance projects are often led by data governance teams, security teams or even cross-functional data governance councils who map out a process and assign data stewards to be responsible for various data sets or types of data. They’re often focused on data quality and data flows – both internally and externally.
As you can see, data governance is not technology. Still, technologies can enable the enterprise data governance model at various stages. And due to increased regulatory pressures, more and more software companies offer “data governance” solutions. Unfortunately, many of these solutions are narrowly focused on the initial steps of the data governance strategy—data discovery, classification or lineage. However, data governance can’t just be about data discovery, cataloguing or metadata management. While many regulations start with the requirement that companies “know” their data, they’ll never be fully in compliance if organizations stop there. In addition, fines and fees are associated with allowing data to be misused or exfiltrated, and the only way to avoid those is by ensuring data is used securely.
Data Governance Challenges
Companies can run into many data governance challenges – from knowing what data they have to where data is to understanding where the data comes from and if they can trust it or not. You can solve many of these challenges with the various data catalog solutions mentioned above. These data catalogs do a great job at helping companies discover, classify, organize and present a variety of data in a way that makes it understandable to data professionals and potential data users. You can think of the result as a data “card catalog” that provides a lot of context about the data but does not provide the data itself. Some catalog solutions even offer a shopping cart feature that makes it very easy for users to select the data they want to use.
That leads to the following data governance challenge: controlling access to data to ensure that only the people who should have access to specific data have access to that data.
This goes beyond the scope of most data catalog solutions – it’s like having a shopping cart with no ability to check out and receive your item. Managing these requests is often done manually via SQL or other database code. It can become a time-consuming and error-prone process for DBAs, data architects and data engineers as requests for access to data pile up. This happens very quickly once the data catalog is available – as soon as users within the organization can easily see what data is available, the next step is undoubtedly wanting access to it. In no time, those tasked with making data available to the company spend more time managing users and maintaining policies than they do developing new data projects.
Data Governance Benefits
While data governance can be a challenging task, there would not be so much focus on it if the benefits didn’t outweigh the effort. With a thoughtful and effective data governance strategy, enterprises can achieve these benefits:
1. Avoid hefty fines and stringent sanctions on leaked PII
As mentioned above, every company that deals with PII is subject to regulations regarding data handling. In the US, the regulatory landscape is still patchy but targeting the most stringent requirements is the easiest path. A robust data governance practice can ensure companies meet their obligations and avoid fines across all their spheres of operation.
2. Leverage data-driven decisions for competitive advantage
A key reason there are growing regulations around collecting and using personal and sensitive data is that companies would like to use this data to understand their customers better gain insight into optimization opportunities, and increase their competitive advantages.
In a Splunk survey of data-focused IT and business managers, 60 percent said both the value and amount of data collected by their organizations will continue to increase. Most respondents also rate the data they’re collecting as extremely or very valuable to their organization’s overall success and innovation. In a recent Snowflake survey with the Economist, 87% say that data is the most important competitive differentiator in the business landscape today, and 86% agree that the winners in their industry will be those organizations that can use data to create innovative products and services. A data governance strategy gives companies insight into what data is available to gather insight from, ensures the data is reliable and sets a standard and a practice for maintaining that data in the future, allowing the value of the data to grow.
3. Improve customer trust and relationships
In a 2019 Pew Research Center study, 81% of Americans said that the potential risks they face because of data collection by companies outweigh the benefits. This might be because 72% say they personally benefit very little or not at all from the data companies gather about them. However, a recent McKinsey survey showed that consumers are more likely to trust companies that only ask for information relevant to the transaction and react quickly to hacks and breaches or actively disclose incidents. Coincidentally, these are some of the requirements of data privacy regulations – only gather the information you need and be upfront, timely and transparent about leaks.
What is data governance in healthcare?
Data governance in healthcare is very focused on complying with federal regulations around keeping personal health information (PHI) private. The US Health Insurance Portability and Accountability Act of 1996 (HIPAA) modernized the flow of healthcare information. It stipulates how personally identifiable information maintained by the healthcare and healthcare insurance industries should be protected from fraud and theft, and addressed some limitations on healthcare insurance coverage. It generally prohibits healthcare providers and healthcare businesses, called covered entities, from disclosing protected information to anyone other than a patient and the patient's authorized representatives without their consent. With limited exceptions, it does not restrict patients from receiving information about themselves. It does not prohibit patients from voluntarily sharing their health information however they choose, nor does it require confidentiality where a patient discloses medical information to family members, friends, or other individuals not a part of a covered entity. Any entity that has access to or holds personal health information on an individual is required to comply with HIPAA.
Data Governance Best Practices
Today, organizations utilize massive amounts of data across the enterprise to keep up with the pace of innovation and stay ahead of the competition. But making data available to users throughout the business also increases the risk of loss and the potential costs of a breach. It seems like an impossible choice: use data or protect it. But unfortunately, it’s not a choice; organizations must protect data before sharing it.
This requires a solution that includes these enterprise data governance best practices:
- Data discovery, classification and lineage – to ensure regulated data governance, companies must be able to identify, locate and trust it.
- Automated data access controls – as the need for data across the business grows, manual granting of access requests becomes infeasible. Manual controls slow down access to data and introduce the possibility of human error, potentially creating compliance issues instead of avoiding them. Role-based access controls are more efficient in ensuring that only authorized users get access to the data they need.
- Data usage visibility and tracking – once data has been logged and access granted, there must be visibility into who is using what data, when and how much. This helps companies prepare for an audit while ensuring appropriate data usage. It can also provide valuable insight into normal usage patterns to identify out-of-normal areas for concern more easily
- Automated policy enforcement - after data access has been granted, there must still be the ability to automatically alert, slow or stop any out-of-policy activity to prevent or halt credentialed access threats.
In addition, a solution must make the implementation of data governance easy for groups across the company. It’s not just data, security or governance teams responsible for keeping data safe – it’s everyone’s job.
Data Governance: the Future
There’s zero chance that data collection, use and regulation will decrease in the coming years. IDC predicts that the global datasphere will double in size from 2022 to 2026. Regulations also show no sign of slowing – a US federal privacy bill was making its way through approvals as of July 2022.
Both of these trends mean that if companies don’t have a data governance strategy in place now, they will soon need to. As a result, the number of data governance solutions will continue to increase rapidly. Some of these will come from legacy players seemingly offering soup to nuts; some from energetic new startups providing a fix for a single task with very little expertise. We expect the industry to move toward an enterprise data governance solution that helps companies meet global privacy requirements while being easy to use, manageable and scalable to keep up with growing data and regulations.
Aug 3
0
min
Data Catalogs and Data Governance: 4 Steps to Control and Protect Sensitive Data
ALTR Blog
A data catalog is a tool that puts metadata at your fingertips. Remember libraries? The card catalog puts all the information about a book in a physical or virtual index, such as its author, location, category, size (in pages), and the date published. You can find a similar search tool or index in an online music or video service. The catalog gives you all the essentials about the thing or data, but it is not the data itself. Some catalogs do not provide any measure of protection other than passive alerts and logs. Even basic access controls and data masking can shift the burden to data owners and operators. Coding access controls in a database puts more stress on the DBAs. Solutions requiring copying sensitive data into a proprietary database still expose the original data. These steps also don’t stop credentialed access threats: system admins can still access sensitive customer data. They can accidentally delete the asset. If credentials get lost or stolen, anyone can steal the data or cause other harm to your business. Data classifiers and catalogs are valuable, no doubt about it. But they’re not governance. They can’t fulfill requests for access, track, or constrain them. When it comes to data catalogs and data governance, you must address a broad spectrum of access and security issues, including:
Access:
You can’t give everyone the skeleton key to your valuable data; you must limit access to sensitive data for specific users.
Compliance:
If you cannot track individual data consumption, it will be nearly impossible to maintain an audit trail and share it for compliance.
Automation:
How do you ensure that the policies you set up are implemented correctly? Do you have to hand them off to another team to execute? Or do you have to write and maintain the code-based controls yourself?
Scale:
As data grows in volume and value, you’ll see more demand from users to access it. You must also ensure the governance doesn’t impede efficiency, performance, or the user experience. Controlling access can’t grind everything to a halt.
Protection:
Sensitive data must be secure; it’s the law virtually everywhere. Governance must ensure confidential data receives the maximum security available wherever it is. Companies need visibility into who consumes the data, when, and how much. They must see both baseline activity and out-of-the-norm spikes. And they must take the next crucial step into holistic data security that limits the potential damage of credentialed access threats.
Data Catalogs and Data Governance: 4 Steps to Control and Protect Sensitive Data
When it’s all said and done, data governance must be easy to implement and scale for companies as part of their responsibility to collect, store, and protect sensitive data. Bridging the gap in security and access can help you comply with applicable regulations worldwide while ensuring protection for the most valuable assets. When it comes to data catalogs and data governance you can follow these four steps to control access and deliver protection over sensitive data:
1. Integrate your data governance tools with an automated policy enforcement engine with patented security.
The data governance solution should provide security that can be hands-free, require no code to implement, and focus on the original data (not a copy) to ensure only the people who should have access do. This means consumption limits and thresholds where abnormal usage triggers an alert to halt access in real-time. Tokenizing the most critical and valuable data prevents theft and misuse. These controls help admins stop insider threats and allow continued access to sensitive data without risking it.
2. Set your policies once and automate implementation to reduce manual errors and risk.
You can eliminate tedious and manual configuration of access policies to save time and ensure consistent enforcement. Automation lets you control access by user role or database row and audit every instance. These policies restrict access and limit what users can see and analyze within the database. The ability to track and report reporting on every model of access makes it easy to comply with regulatory requests.
3. Enable self-service data requests to speed up data access.
Automated access controls let admins provide continued access to sensitive data, apply masking policies, and stop credentialed access threats for thousands of end users without putting the data at risk. Data teams can move at speed required by the business yet be restricted to accessing only the data sets they’re authorized to view. For instance, you can prevent an employee based in France from seeing local data meant only for Germans. You can also avoid commingling data that originated from multiple sources or regions. This allows you to foster collaboration and sharing with greater confidence in security and privacy measures.
4. Scale your data access control and policy enforcement as the use and uses of data grow throughout your business.
The scope of data access requests today within enterprises has reached a level that requires advanced automation. Some enterprises may have scanned and catalogued thousands of databases, even more. Data governance solutions should quickly implement and manage access for thousands of users to match. Features like rate-limiting stipulate the length or amount of access, such as seeing a small sample for a brief period for anyone who isn’t the intended consumer, like the catalog admin—scaling policy thresholds as needed allows you to optimize collaboration while stopping data theft or accidental exposure. You can limit access regardless of the user group size or data set.
Modern and Simple Data Governance
Modern data organizations are moving to simplify data governance by bringing visibility to their data and seeking to understand what they have. However, data governance doesn’t stop once you catalog your data. That’s like indexing a vast collection of books or songs but letting no one read or listen to the greatest hits. You should grant access to sensitive data but do so efficiently to not interfere with your day job and effectively comply with regulations and policy. Integrating a data catalog with an automated policy enforcement engine is the right strategy. You’ll gain the complete package, with a governance policy that is easy to implement and enforce, access controls that focus on the original sensitive data, and detailed records of every data request and usage. Managing enterprise data governance at scale lets, you use data securely to add value faster, turning the proverbial oil into jet fuel for your organization’s growth.
Get the latest from ALTR
Subscribe below to stay up to date with our team, upcoming events, new feature releases, and more.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.