BLOG SPOTLIGHT
Navigating the chaos of data security in the age of GenAI—let’s break down what needs to happen next.
Read more
Popular
Sep 19
0
min
Data Security for Generative AI: Where Do We Even Begin?
If you haven’t noticed the wave of Generative AI sweeping across the enterprise hardware and software world, it certainly would have hit you within 5 minutes of attending Big Data London, one of the UK’s leading data, analytics, and AI events. Having attended last year’s show, I can confidently say AI wasn’t nearly as dominant. But now? It’s everywhere, transforming not just this event but countless others. AI has officially taken over!
As a data security focused person, it is exciting and terrifying to see all the buzz. I’m excited because it feels like we’re on the verge of a seismic shift in technology—on par with the rise of the web or the cloud—driven by GenAI. And I get to witness it firsthand! But it is terrifying to see all the applications, solution consultants, database vendors and others selling happy GenAI stories to customers. I could scream into the loud buzz of the show floor, “We have seen this movie before! Don’t let the development of GenAI applications outpace the critical need for data security!” I’m thinking about the rush to web, the rush to mobile, the rush to cloud. All of these previous shifts suffer from the same thing: security is boring and we don’t want to do it. What definitely wasn’t boring was using a groundbreaking mobile app from 1800flowers.com to buy flowers—that was cool! Let’s have more of that! Who cares about security, right? That can wait…
Cyber security, and data security in particular, have had the task of keeping up with the excitement of new applications for decades. The ALTR engineering office is in beautiful Melbourne, FL just a few hours away from Disney. When I see a young mother or father with a concerned look racing after their young child who couldn’t care less that they are about to get run over by a popcorn stand, I think “Application users are the kids, security people are the parent, and GenAI is whichever Disney character the kid can’t wait to hug.” It’s cute, but dangerous. This is what is happening with GenAI and security.
As applications have evolved so has data security. Below is an example of these application evolutions and how security has adapted to cover the new weaknesses of each evolution.
What is Making Generative AI Hard to Secure?
The simple answer is: we don’t fully know. It’s not just that we’re still figuring out how to secure GenAI (spoiler: we haven’t cracked that yet); it’s that we don’t even fully understand how these Large Language Models (LLMs) and GenAI systems truly operate. Even the developers behind these models can’t entirely explain their inner workings. How do you secure something you can’t fully comprehend? The reality is—you can’t.
So, what do we know?
We know two things:
1. Each evolution of applications and data products has been secured by building upon the principles of the previous generation. What has been working well needs to be hardened and expanded.
2. LLMs present two new and very hard problems to solve: data ownership and data access.
Let’s dive into the second part first. To get access to the hardware currently required to train and run LLMs we must use cloud or shared resources. Things like ChatGPT or NVIDA’s DGX cloud. Until these models require less hardware or the hardware magically becomes more available, this truth will hold.
Similar to the early days of the internet, sensitive information was desired to be sent and received on shared internet lines. The internet was great for transmitting public or non-sensitive information, but how could banking and healthcare use public internet lines to send and receive sensitive information? Enter TLS. This is the same problem facing LLMs today.
How can a business (or even a person for that matter) use a public and shared LLM/GenAI system without fear of data exposure? Well, it’s a very challenging. And not a problem that a traditional data security provider can solve. Luckily there are really smart people working on this solution like the folks at Protopia.ai.
So, data ownership is being addressed much like how TLS solved the private-information-flowing-on-public-internet-lines. And that’s a huge step forward. What about data access?
This one is a bit tougher. There are some schools of thought about prompt control and data classification within AI responses. But this feels a lot like CASB all over again, which didn’t exactly hit the mark for SaaS security. In my opinion, until these models can pinpoint exactly where their responses are coming from—essentially, identify the data sets they’ve learned from —and also understand who is asking the questions, we’ll continue to face risks. Only then can we prevent situations where an intern asks questions and gets answers that should only be accessible to the CEO.
Going back to what we know, the first item, we will need to build upon the solid data security foundations that got us to this point in the first place. It has become clear to me that for the next few years, Retrieval-Augmented Generation (RAG) will be how enterprises globally interact with LLMs and GenAI. While this is not a silver bullet, it’s the best shot busineses have to leverage the power of public models while keeping private information safe.
With the adoption of RAG techniques, the core data security pillars that have been bearing the load of a data lake or warehouse to date will need to be braced for extra load.
Data classification and discovery needs to be cheap, fast, and accurate. Businesses must continuously ensure that any information unsuitable for RAG workloads hasn’t slipped into the database from which retrieval occurs. This constant vigilance is crucial to maintaining secure and compliant operations. This is the first step.
The next step is to layer access control and data access monitoring such that the business can easily set the rules for which types of data are allowed to be used by the different models and use cases. Just as service accounts for BI tools need access control, so to do service accounts for the purposes of RAG. On top of these access controls, near-real-time data access logging must be present. As the RAG workloads access the data, these logs are used to inform the business if any access has changed and allows the business to easily comply with internal and external audits proving they are only using approved data sets with public LLMs and GenAI models.
Last step, keep the data secure at rest. The use of LLMs and GenAI will only accelerate the migration of sensitive data into the cloud. These data elements that were once protected on-prem will have to be protected in the cloud as well. But there is a catch. The scale requirements of this data protection will be a new challenge for businesses. You will not be able to point your existing on-prem-based encryption or tokenization solution to a cloud database like Snowflake and expect to get the full value of Snowflake.
When prospects or customers ask me, “What is ALTR’s solution for securing LLMs and GenAI” I used to joke with them and say, “Nothing!” But now I’ve learned the right response, “The same thing we’ve always done to secure your data—just with even more precision and focus for today’s challenges.” The use of LLMs and GenAI is exciting and scary at the same time. One way to reduce the anxiety is to start with a solid foundation of understanding what data you have, how that data is allowed to be used, and whether you prove that the data is safe at rest and in motion.
This does not mean you cannot use ChatGPT. It just means you must realize that you were once that careless child running with arms wide open to Mickey, but now you are the concerned parent. Your teams and company will be eager to dive headfirst into GenAI, but it’s crucial that you can articulate why this journey is complex and how you plan to guide them there safely. It begins with mastering the fundamentals and gradually tackling the tough new challenges that come with this powerful technology.
Sep 9
0
min
ALTR Expands GTM Team with Powerhouse Hires to Lead the Charge in Data Security
ALTR isn’t just keeping pace with the evolving data security landscape—we’re setting the speed limit. As businesses scramble to safeguard their data, ALTR is not just another player in the game; we’re the go-to solution for bulletproof data access control and security. And today, we’re doubling down on that promise with three strategic hires to turbocharge our Go-To-Market (GTM) strategy.
Meet the Heavy Hitters
Christy Baldassarre
Christy Baldassarre joins us as our new Director of Marketing, bringing a formidable blend of strategic vision and execution prowess. With a track record of driving brand growth and market penetration, Christy excels at crafting compelling narratives that resonate with target audiences. She’s a master at turning complex concepts into clear, impactful messaging and knows how to leverage the latest digital marketing tactics to amplify ALTR’s voice.
"I am excited to be on such a great team and to be a part of taking ALTR to the next level. I chose ALTR because of its excellence in Cloud Security and Data Protection. This is a great opportunity to collaborate with such a visionary team and contribute to groundbreaking solutions that not only push boundaries but set new standards of how to keep everyone’s data safe." - Christy
Rick McBride
Rick McBride, our new Demand Gen Manager, brings a deep expertise in go-to-market strategy. With a strong foundation in business development, Rick has honed his skills in identifying opportunities and driving pipeline growth from the ground up. He’s not just about crafting campaigns; Rick knows how to connect with decision-makers and convert interest into action.
“A successful go-to-market strategy thrives on seamless collaboration across various teams, and our GTM group is poised to be the driving force behind it. We're set to champion the Snowflake ecosystem—engaging with customers, Snowflake’s Field Sales team, and partners alike—to fuel strategic growth. By leveraging Snowflake's powerful native capabilities in Security and Governance, we aim to deliver at the speed and scale that Snowflake users expect. We're thrilled to extend this value to every organization that prioritizes and trusts Snowflake for their data management needs!” - Rick
George Policastro
Next, we've got George Policastro as our newest Account Executive. George is a seasoned sales professional with a proven track record of closing complex deals and delivering results. His strengths lie in his ability to deeply understand client needs, build lasting relationships, and strategically navigate the sales process to drive success.
"I’m thrilled to join ALTR and tackle one of the biggest challenges organizations face today: securing their sensitive data while unlocking its full potential to drive business growth." - George
ALTR: Defining the Future of Data Access Control and Security
The world of data security and governance has evolved dramatically from the days of simple perimeter defenses. Now, we’re dealing with sophisticated, multi-layered security strategies that need to keep up with cybercriminals who are more aggressive and resourceful than ever. The core principles—knowing where your data is, who can access it, and ensuring its protection—haven’t changed. However, as data moves to the cloud, the challenge is achieving these goals at an unprecedented scale and speed.
That’s where ALTR excels. We’re not just providing solutions; we’re reimagining what data access control and security can be in a cloud-first world. By cutting through the complexities and inefficiencies of traditional methods, we deliver a streamlined, scalable approach that makes data security both simple and powerful. Our intuitive automated access controls, policy automation, and real-time data observability empower organizations to protect sensitive data at rest, in transit, and in use—effortlessly and at lightning speed. With ALTR, securing your data isn’t just more accessible; it’s smarter, faster, and designed for today’s dynamic cloud environments.
With our latest GTM team expansion, we’re fortifying our foundation to evolve into a cloud data security market leader who’s not just part of the conversation but is driving it.
Sep 3
0
min
Unleashing the Power of FPE: ALTR Key Sharing Meets Snowflake Data Sharing
In a world where data breaches and privacy threats are the norm, safeguarding sensitive information is no longer optional—it's critical. As regulations tighten and privacy concerns soar, our customers are demanding cutting-edge solutions that don't just secure their data but do so with finesse. Enter Format Preserving Encryption (FPE). When paired with ALTR's capability to seamlessly share encryption keys with trusted third parties via platforms like Snowflake's data sharing, FPE becomes a game-changer.
Understanding Format Preserving Encryption (FPE)
Format Preserving Encryption (FPE) is a type of encryption that ensures the encrypted data retains the same format as the original plaintext. For example, if a credit card number is encrypted using FPE, the resulting ciphertext will still appear as a string of digits of the same length. This characteristic makes FPE particularly useful in scenarios where maintaining data format is crucial, such as legacy systems, databases, or applications requiring data in a specific format.
Key Benefits of FPE
Seamless Integration
FPE maintains the data format, allowing easy integration into existing data pipelines without requiring significant changes. This minimizes the impact on business operations and reduces the costs associated with implementing encryption.
Compliance with Regulations
Many regulatory frameworks, such as the GDPR, PCI-DSS, and HIPAA, mandate the protection of sensitive data. FPE helps organizations comply with these regulations by ensuring that data is encrypted to preserve its usability and format, which can sometimes be a requirement in these standards.
Enhanced Data Utility
Unlike traditional encryption methods, FPE allows encrypted data to be used in its existing form for specific operations, such as searches, sorting, and indexing. This ensures organizations can continue to derive value from their data without compromising security.
The Role of Snowflake in Data Sharing
Snowflake is a cloud-based data warehousing platform that allows organizations to store, process, and analyze large volumes of data. One of its differentiating features is data sharing, which enables companies to share live, governed data with other Snowflake accounts in a secure and controlled manner while also shifting the cost of the computing operations of the data over to the share's consumer.
Key Features of Snowflake Data Sharing
Real-Time Data Access
Snowflake's data sharing allows recipients to access shared data in real-time, ensuring they always have the most up-to-date information. This is particularly valuable in scenarios where timely access to data is critical, such as in financial services or healthcare.
Secure Data Exchange
Snowflake's platform is designed with security at its core. Data sharing is governed by robust access controls, ensuring only authorized parties can view or interact with the shared data. This is crucial for maintaining the confidentiality and integrity of sensitive information.
Scalability and Flexibility
Snowflake's architecture allows for easy scalability, enabling organizations to share large volumes of data with multiple parties without compromising performance. Additionally, the platform supports a wide range of data formats and types, making it suitable for diverse use cases.
The Power of Combining FPE with Snowflake’s Key Sharing
When FPE is combined with the ability to share encryption keys via Snowflake's data sharing, it unlocks a new level of security and flexibility for organizations. This combination addresses several critical challenges in data protection and sharing:
Controlled Access to Encrypted Data
By leveraging FPE, organizations can encrypt sensitive data while preserving its format. However, there are scenarios where this encrypted data needs to be shared with trusted third parties, such as partners, auditors, or service providers. Through Snowflake's data sharing and ALTR's FPE Key Sharing, companies can securely share encrypted data along with the corresponding encryption keys. This allows the third party to decrypt the data within the policies that they have defined and use it as needed.
Data Security Across Multiple Environments
In a multi-cloud or hybrid environment, data often needs to be moved between different systems or shared with external entities. Traditional encryption methods can be cumbersome in such scenarios, as they require extensive reconfiguration or critical management efforts. However, with FPE and Snowflake's key sharing, organizations can seamlessly share encrypted data across different environments without compromising security. The encryption keys can be securely shared via Snowflake, ensuring only authorized parties can decrypt and access the data.
Regulatory Compliance and Auditing
Many regulations require organizations to demonstrate that they have implemented appropriate security measures to protect sensitive data. By using FPE, companies can encrypt data that complies with these regulations. At the same time, the ability to share encryption keys through Snowflake ensures that data can be securely shared with auditors or regulators. Additionally, Snowflake's robust logging and auditing capabilities provide a detailed record of who accessed the data and when which is essential for compliance reporting.
Enhanced Collaboration with Partners
In finance, healthcare, and retail industries, collaboration with external partners is often essential. However, sharing sensitive data with these partners presents significant security risks. By combining FPE with ALTR's key sharing, organizations can securely share encrypted data with partners, ensuring that sensitive information is transmitted throughout the data's lifecycle, including across shares. This enables more effective collaboration without compromising data security.
Efficient and Secure Data Processing
Specific data processing tasks, such as data analytics or AI model training, require access to large volumes of data. In scenarios where this data is sensitive, encryption is necessary. However, traditional encryption methods can hinder the efficiency of these tasks due to the need for decryption before processing. With FPE, the data can remain encrypted during processing, while ALTR's key sharing allows the consumer to decrypt data only when absolutely necessary. This ensures that data processing is both secure and efficient.
Use Cases of FPE with ALTR Key Sharing
To better understand the value of combining FPE with ALTR's key sharing, let's explore a few use cases:
Financial Services
In the financial sector, organizations handle a vast amount of sensitive data, including customer information, transaction details, and credit card numbers. FPE can encrypt this data while preserving its format, ensuring it can still be used in legacy systems and applications. Through Snowflake's data sharing, financial institutions can securely share encrypted transaction data with external auditors, partners, or regulators, along with the necessary encryption keys. This ensures compliance with regulations while maintaining the security of sensitive information.
Healthcare
Healthcare organizations often need to share patient data with external entities, such as insurance companies or research institutions. FPE can encrypt patient records, ensuring they remain secure while preserving the format required for healthcare applications. Snowflake's data sharing allows healthcare providers to securely share this encrypted data with third parties. At the same time, ALTR enables the sharing of the corresponding encryption keys, enabling them to access and use the data while ensuring compliance with HIPAA and other regulations.
Retail
Retailers often need to share customer data with marketing partners, payment processors, or logistics providers. FPE can be used to encrypt customer information, such as names, addresses, and payment details while maintaining the format required for retail systems. Snowflake's data sharing enables retailers to securely share this encrypted data with their partners; with ALTR, the encryption keys are also shared, ensuring that customer information is always protected.
The Broader Implications for Businesses
The combination of Format Preserving Encryption and ALTR's key-sharing capabilities represents a significant advancement in the field of data security. This approach addresses several critical challenges in data protection and sharing by enabling organizations to securely share encrypted data with trusted third parties.
Strengthening Trust and Collaboration
In an increasingly interconnected world, businesses must collaborate with external partners and share data to remain competitive. However, this collaboration often comes with significant security risks. By leveraging FPE and ALTR's key sharing, organizations can strengthen trust with their partners by ensuring that sensitive data is always protected, even when shared. This leads to more effective and secure collaboration, ultimately driving business success.
Reducing the Risk of Data Breaches
Data breaches, including financial losses, reputational damage, and regulatory penalties, can devastate businesses. Organizations can significantly reduce the risk of data breaches by encrypting sensitive data with FPE and securely sharing it via Snowflake. Even if the data is intercepted, it remains protected, as only authorized parties with the corresponding encryption keys can decrypt it.
Enabling Innovation While Ensuring Security
As organizations continue to innovate and leverage new technologies, such as artificial intelligence and machine learning, the need for secure data sharing will only grow. The combination of FPE and ALTR's key sharing enables businesses to securely share and process data innovatively without compromising security. This ensures that organizations can continue to innovate while protecting their most valuable asset – their data.
Wrapping Up
Integrating Format Preserving Encryption with ALTR's key sharing capabilities offers a powerful solution for organizations seeking to protect sensitive data while enabling secure collaboration and innovation. By preserving the format of encrypted data and allowing for secure key sharing, this approach addresses critical challenges in data protection, regulatory compliance, and data sharing across multiple environments. As businesses navigate the complexities of the digital age, the value of this combined solution will only become more apparent, making it a vital component of any robust data security strategy.
ALTR's Format-preserving Encryption is now available on Snowflake Marketplace.
Aug 21
0
min
Data Protection at Snowflake Scale
“Today is the day!” you exclaim to yourself as you settle into your desk on Monday morning. After months of meticulous planning, the migration from Teradata to Snowflake begins now. You have been through all the back-and-forth with leadership on why this migration is needed: Teradata is expensive, Teradata is not agile, Snowflake creates a single source of data truth, and Snowflake is instantly on and scales when you need it. It’s perfect for you and your business.
As you follow your meticulously planned checklist for the migration, you're utilizing cutting-edge tools like DBT, Okta, and Sigma. These tools are not just cool, they're the future. You're moving your database structure, loading the initial non-sensitive data, repointing your ETL pipelines, and witnessing the power of modern technology in action. Everything is working like a charm.
A few weeks or months of testing go by, your downstream consumers of data are still using Teradata but are starting to give thumbs up on the Snowflake workloads that you have already migrated. Things are going well. You have not thought about CPU or disk space for the Teradata box in a while, which was the point of the migration. You finally get word from all stakeholders that this trial migration was a success! You call your Snowflake team, and tell them to back up the truck, you are clear to move the remaining workloads. Life is good. But then, comes a knock at the door.
It’s Pat from Security & Risk. You know Pat well and enjoy Pat’s company, but you also do as much as possible to avoid Pat because you are in data and, well, we all know the feeling. Pat tells you, “Heard we are finally getting off Teradata; that’s awesome! Do you have a plan for the PII and SSNs that are kept in that one Teradata database that we require using Protegrity for audit and compliance reasons?” You nod, “I do, but I couldn't do it without your expertise. I’ve been reading the Snowflake documentation, and I'm in the process of writing a few small AWS Lambdas to interface with Protegrity. Your input is crucial to this process.” Pat smiles, gives a non-assuring hand on your back and walks out. Phew, no more Pat.
Four weeks later, you're utterly exhausted. You've logged over 50 hours in Snowflake with fellow data engineers, and tapped into the expertise of one of the cloud ops team members who knows Lambda inside out. You have escalated to Snowflake support, but your external function calls from Snowflake to AWS keep timing out. AWS support is unable to help. Now, you have memory limits being hit with AWS Lambda. Suddenly, the internal network team does not want to keep the ports open to hit Protegrity from AWS, and you need to use a Private Link connection with additional security controls. You are behind on the Teradata migrations. There is no end in sight of the scale problems. Shoot, this is not working.
Don’t worry, you are not alone. This is the same experience felt by hundreds of Snowflake customers, and it stems from the same problem: everything about your Snowflake migration was planned for the new architecture of Snowflake except for one thing: data protection. You followed all the blogs and user guides, and your stateless data pipeline feeding Snowflake with a Kafka bus is perfect. Sigma is running without limits. The team is happy, but they want that customer data now. Except, you can’t use it until you solve this security problem.
Snowflake and OLAP workloads, generally, turned data protection on its head. OLTP workloads are easy to secure. You know the access points and the typical pattern of user behavior, so you can easily plan for scale and up-time. OLAP is widely unpredictable. Large queries, small queries, ten rows, 10M rows, it’s a nightmare for security. There is only one path forward: you must get purpose-built data protection for Snowflake.
You need a data protection solution that matches Snowflake’s architecture, just like when you matched Protegrity to Teradata. If Snowflake is going to be elastic, your data protection needs to be elastic. If Snowflake is going to be accessed by many downstream consumers, you need to be able to integrate data protection into the access policies in Snowflake. Who is going to do that work? Who will maintain this code? How can you control costs? The answer to all those questions is ALTR.
ALTR’s purpose-built native app for data protection is an easy solution for Snowflake. You can install it on your own. You can use your Snowflake committed dollars to pay for the service. ALTR’s data protection scale is controlled by Snowflake and nothing else. It’s the easiest way to get back on track. Call your Snowflake team, ask them about ALTR. It will feel good walking back into Pat’s office with your head held high and your data migration back on track.
Whether your team currently has Protegrity or Voltage, you will face the same problems. Do not waste your time trying to get these solutions to scale, just call ATLR.
Don’t just take my word for it…
Browse All
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Aug 18
0
min
Shift Left™: The Importance of Governing and Securing Data From Source Systems to the Cloud
ALTR Blog
In today's fast-paced world, businesses are generating and accumulating data at an unprecedented rate. To maximize the value of data to the enterprise, any modern data architecture must contemplate how sensitive data is governed and protected across the entire data journey- from source, to the cloud, to users.
The focus for many companies is instilling effective data access governance and data security in their cloud destination, like Snowflake. However, risk, governance, compliance, and security stakeholders now recognize that those sensitive workloads should be subject to full data governance and protection before it lands in Snowflake. It’s no longer enough to rely on securing your data after it lands in a cloud data warehouse; data owners must protect data from the instant it migrates from a source system and throughout its entire journey to the cloud. This applies to data in ETL and ELT pipelines and transient storage mechanisms like GCS and Amazon S3 buckets.
ALTR’s unique architectural advantages allow any enterprise to easily extend robust data governance and security features on Snowflake upstream into data pipelines and data catalogs, guaranteeing the security of sensitive data throughout the entire data journey – something competitors cannot offer.
With data coming from many sources and the critical importance of securing that data upstream, the ability to shift your data governance and data security implementation left has become a necessary capability for the modern data enterprise.
What Does it Mean to “Shift Left™”?
The modern data ecosystem faces a major issue with the complexity of moving highly sensitive data from on-premise systems, where it’s likely been held for years, to cloud data warehouses. Data teams are so hyper-focused on where the data lands in Snowflake that often they don’t realize that the data in motion, while traversing data pipelines, is visible in plain text. Failing to protect and secure data in motion before landing it in the cloud data warehouse represents a significant compliance risk in many highly regulated environments like Healthcare and Financial Services institutions.
“Shifting Left™” means initiating robust data governance and data security capabilities available in Snowflake and extending them back to data as it leaves source systems. Doing so ensures the policies are attached to, and remain with, the workload throughout the data journey to the cloud.
As soon as data leaves a source system and enters an ETL/ELT pipeline, that solution can call directly to ALTR through existing open-source connectors or via our Rest APIs to instrument data classification, data tagging, and data tokenization, directly in the ETL/ELT solution.
The same holds true for Data Catalogs. That means sensitive data is governed and protected from the instant it begins its journey from source to cloud. And, when those data land in Snowflake, they land with everything tagged, with active data access governance policies in place, and any highly sensitive values tokenized. Only ALTR can accomplish this because of the architectural advantages made possible through our unique integration withSnowflake. We have a growing library of open-source connectors for best-in-class solutions for ETL/ELT providers and Data Catalogs, and some providers are even building ALTR directly into their offerings (more on these exciting developments soon…).
Why is Shifting Left™ Critical for Your Data Governance Solution?
For many organizations, significant levels of compliance, governance, security, and privacy risks have yet to be rationalized for data in transit to the cloud. These gaps between Source Systems and Cloud represent major security threats and significant compliance issues for organizations operating in highly regulated environments like Healthcare and Financial Services.
ALTR can deliver immediate time to value, closing these compliance and security gaps from source, to Snowflake, to your data consumers. No other solution on the market today can make that same claim. ALTR’s SaaS based approach to data governance and data security is unique and is why we’re the only Data Access Governance solution that can take the same powerful capabilities over Snowflake, and shift them left to orchestrate further upstream in your data architecture.
Our esteemed competitors typically require a 6-month implementation cycle for their offerings, and they often only apply to data that already exists in Snowflake. Because of their legacy architectures and proxy-based approaches, they cannot be instrumented as highly-available, cloud-native services elsewhere in the data journey. These organizations cannot shift left™ and cannot help your organization close any compliance, security, or privacy gaps that exist before data hits Snowflake.
How Can ALTR Offer a Shift Left™ Approach?
ALTR is the first and only data governance solution to build a cloud-native integration with Snowflake using its external function capabilities to bring data governance and data access into the Snowflake environment. Snowflake has incredibly powerful native capabilities for data governance, yet at scale, these can be extremely complex, time-consuming, and require hours of manual SQL coding.
ALTR’s architectural advantages allow for classification, data governance, and access controls to occur seamlessly with our point and click user interface. ALTR orchestrates data governance in Snowflake because we’ve capitalized on their powerful native capabilities, making these features infinitely easier to use at scale. ALTR removes the complexity of leveraging Snowflake to its full capacity and increases the utility of Snowflake to all customers by making it safe for highly sensitive workloads and opening it up for entirely new use cases.
ALTR is uniquely positioned to offer shift left™ capabilities because we allow you to implement data governance policy into ETL pipelines, into data catalogs, into streaming busses - anywhere in your architectural diagram that exists to the left of your cloud data warehouse.
Conclusion
Leaving your sensitive data unsecured and out of compliance until it reaches the cloud means it’s at significant risk of exposure.The design principles of ALTR’s highly available, cloud-native, SaaS-based offering for Snowflake makes ALTR the only Data Access Governance and Security solution that can ensure the protection of your sensitive data from source system, to cloud, to data consumer.
See it in Action: Automate Data Control Protection with ALTR
Let us show you:
- How we integrate with industry-leading data platforms like Snowflake
- How you can protect data with your ETL throughout your cloud data migration with best-in-class providers like Matillion
- How easy it is to automate data governance and security at scale directly from best-in-breed data catalogs like Alation
Aug 16
0
min
Snowflake Views: DIY vs ALTR
ALTR Blog
It goes without saying that in today’s environment, governing and protecting sensitive data requires using different tactics to execute an effective security strategy. Here at ALTR we offer numerous methods to choose from for your business needs; the capability to govern Snowflake data views in situations where you might want to see data that’s combined or separated is one to consider.
This blog provides a high-level explanation of what a ‘view’ is, the benefits it offers, how it works to manually govern views in Snowflake, and how to use ALTR to automate the governing of views by taking advantage of Snowflake’s native capabilities without needing to write SQL code. A couple of use case examples and a how-to demonstration video are also included that we hope you’ll find helpful.
What are Views and What are the Benefits it Offers?
A ‘view’ is a Snowflake object that allows a query result to be accessed just like it was a table. Think of it as a named query that has been saved. Snowflake users can then query this saved query as if it were a table.
Since the data within the view is the result of the query, then data engineers can create separate views that meet the needs of different types of employees, such as accountants and HR administrators at a hospital.
There are several different types of views in Snowflake that all have different behaviors such as ‘Regular Views’, ‘Materialized Views’, and ‘Secure Views’; however, for the sake of brevity, this blog will only explain views in general terms. For details on how the types of views in Snowflake differ, visit Snowflake Overview of Views.
Benefits that Views Offer
Using ALTR to govern views will enable you to only extract the data that you want to see. As a result, it will be easier to understand when you have a large amount of data.
You will also benefit by being able to grant privileges on a particular view to a specific role, without the people in that role having privileges on the table(s) underlying the view. For example, you can have one view for the HR staff and one view for the accounting staff, so that each of those roles in the hospital can only see the information needed to perform their jobs.
How Snowflake Views Work if You DIY
As stated earlier, there are different types of views that Snowflake supports. Each of them will require you to write SQL code to do it and will require you to define each ‘view’ based on the type you prefer to implement. This can be time-consuming to do and must be maintained as your business scales.
How ALTR’s Policy Automation Works with Snowflake Views
Our policy automation on Snowflake views supports column access and masking. It also enables you to identify and connect columns that exist in Snowflake Views and apply column access policies and masking rules to those columns all without writing SQL code.
Like tables, columns in views must be connected to ALTR before they can be included in governance policies. To govern a column in a Snowflake view, follow the steps below.
- From the Data Management page, click the Add New button.
- In the resulting form, select a Snowflake database.
- Next, click the View tab (shown in the screenshot). This will enable you to identify a specific column from the view to connect by selecting the schema and view for that column.
- Click Connect. Once a column in a Snowflake view is connected to ALTR, then it can be included in column access policies just like columns from tables.
NOTE: Columns in views can also be governed through our Management API. For more details, see our Swagger documentation.
ALTR Use Cases for Snowflake Views
Good to Know: Views in Snowflake inherit the governance policies of their base tables; so, if you query data in a view, then Snowflake will still apply any Dynamic Data Masking Policies and/or Row Access Policies assigned to the Views base table(s). Because of this, it's usually much simpler to only apply governance rules once to the data in tables and leverage this functionality to prevent an explosion of masking policies. However, there are some cases where you may want to apply and manage policies at the View level. As seen in the previous section, ALTR makes adding and/or updating data access policies on views very simple.
Here are a couple of use case examples where using ALTR to govern sensitive data from Snowflake Views can benefit your business as it scales up with Snowflake usage.
Use Case 1. Your organization has a database that’s shared across different Snowflake accounts that you don’t want others to query directly. In addition, Snowflake limits the application of masking policies on the share.
To govern data within a share, you can create a separate database with views that select from the shared database. You can then govern access to columns in these views from the ALTR UI without writing SQL code. This means that you can delegate this administrative task to members of your infosec team instead of DBAs.
Use Case 2. Your Snowflake configuration primarily relies on users, BI tools, etc., querying Views instead of Tables.
Similar to the use case above, if your organization only presents views to end users and never exposes the databases directly, then you can control access to columns in these views from the ALTR UI.
Automate Snowflake Views with ALTR
By using ALTR to govern Snowflake Views, you can minimize data breaches and make informed decisions to execute an effective data security strategy. We’ve made it so simple to use that it’s just a point-and-click in ALTR and you’re done!
See it in Action
Nov 9
0
min
Automated Data Access Governance at Scale with ALTR
ALTR Blog
Data Access Governance is critical to any organization's data strategy. It ensures that the right people have access to the right data at the right time, identifies where sensitive data is being stored, and protects that sensitive information from unauthorized access. With effective Data Access Governance, organizations can strategically improve their compliance with regulatory requirements, reduce the risk of data breaches, and ensure that their data is being used for its intended purpose. It involves understanding who has access to what data, why they have access, and how that access is being managed and monitored. By implementing robust Data Access Governance, businesses and organizations can achieve greater control over their data and minimize the risks associated with data misuse and abuse.
What is Data Access Governance?
Data Access Governance is the process of managing and controlling access to data within an organization’s greater data protection strategy. It encompasses defining policies and procedures that govern who can access certain data, when they can access it, and how they can use it. The goal of Data Access Governance is to ensure that sensitive data is protected from unauthorized access, while also ensuring that the correct people have access to the information they need to do their jobs effectively.
ALTR sits at the intersection of Data Access Governance and Data Security, allowing DBAs, Data Engineers, Data Architects, or any day-to-day businesspeople to govern data access easily and without code. Many companies claim to provide solutions for data security but leave you with gaps in your data security pipeline, opening your organization up to breach. ALTR’s Data Access Governance solution puts the keys in your hands to understand what data you have, create policy around who can access what data and at what frequency, and stay on top of regulations and compliance with near real-time query audits.
Key Principles of Data Access Governance
While there are many ways to administer an effective data access governance program, strong data access governance generally revolves around the following five fundamental principles:
Transparency
As an organization, it's essential to be transparent about what data you're collecting and why you're collecting it. Clarifying what data assets you have and spreading this knowledge across your organization and customers is of utmost importance for your data governance framework. Transparency ensures that all internal and external stakeholders understand the purpose and scope of data collection efforts, fostering trust and compliance within your data governance practices.
Integrity
Data integrity is paramount in data access governance. It ensures that data remains accurate, consistent, and trustworthy throughout its lifecycle. In governance, integrity involves safeguarding data against unauthorized alterations or tampering. Robust access controls, encryption, and regular data quality checks are essential to maintain data integrity. Data users can trust that the information they access has not been compromised or altered inappropriately.
Accountability
Accountability is critical in data access governance, as it assigns responsibility for data-related actions and decisions. Every user, whether an individual or a system, should be accountable for their actions regarding data access. This includes tracking who accessed data, what changes were made, and when these actions occurred. Establishing clear roles and responsibilities ensures that individuals are answerable for their data-related activities, reducing the risk of unauthorized access or misuse.
Consistency
Consistency in data access governance ensures that access policies and practices are uniformly applied across the organization. Access controls, permissions, and policies are consistently enforced regardless of the data source, department, or user. Consistency reduces confusion and the potential for security gaps. Standardized practices simplify management, auditing, and compliance, leading to more effective data governance.
Collaboration
Collaboration is essential for effective data access governance. It encourages cross-functional teamwork among departments, including IT, data stewards, compliance teams, and business units. Collaboration ensures data access policies and decisions align with business objectives and regulatory requirements. It also helps identify and mitigate potential data access risks through collective expertise and knowledge sharing. In a collaborative environment, stakeholders work together to balance data security, compliance, and the organization's need for data access to drive innovation and productivity.
What are Steps of Data Access Governance?
Data Access Governance involves establishing policies and procedures that govern who has access to what data and under what circumstances. The principles of data access governance include:
- Defining the scope of access- Defining the scope of access involves internal standardization of access levels surrounding the data that your organization holds. A successful Data Access Governance strategy must start with seeing the scope of access and ensuring it is clearly defined to all parties. Data classification can help simplify this process tremendously by allowing data owners the visibility to see exactly what data exists that needs to be protected. ALTR lets you classify data for free on Snowflake! Learn how here.
- Establishing roles and responsibilities- Once you understand what data you have and determine which of it is sensitive, you must establish the roles and responsibilities around who is in charge of maintaining that data’s health. Clear and well-defined responsibilities ensure data is never left unmonitored, and greatly reduces the risk of breach.
- Implementing appropriate access controls- After defining what data is sensitive and establishing roles and responsibilities, the next step is implementing the appropriate access controls. This involves creating and defining policy around who is allowed access to what data and at what frequency. ALTR’s point-and-click UI allows data users the full flexibility to set correct access controls simply and scale quickly.
- Continuously monitoring and auditing access- It may feel tempting, once the work has been done to establish rules and create policy, to think that your sensitive data will run by itself. In a study done by Stanford Professor, Jeff Hancock, it was determined that, “85 percent of data breaches are caused by human error,” meaning that your data needs to be continuously monitored to protect against the human errors that may lead to breach. ALTR automates this process – further reducing the risk of human error, by providing real-time alerting capabilities and access to audit logs.
What are the Key Benefits of Data Access Governance?
Both obvious and not, there are numerous benefits to implementing a strong data access governance policy in your organization.
- It helps to ensure that sensitive data is protected from unauthorized access, reducing the risk of data breaches and other security incidents. “In 2022, the number of data compromises in the United States stood at 1802 cases,” Statista reports, this number is up 63% since 2020. Security breaches will only continue to rise as hackers become savvier, and human error remains. Implementing strong data governance with a tool like ALTR that has a proven track record of securing data is critical.
- Data Access Governance can also help to ensure that employees have access to the data they need to do their jobs, while preventing them from accessing data that is not relevant to their roles. This can help to improve productivity and collaboration while minimizing the risk of data misuse or exposure.
- Data Access Governance can help organizations comply with relevant regulations and industry standards, reducing the risk of penalties and legal action. Whether your organization must be PCI compliant, or you fall under an industry data regulation, choosing a data access governance tool that will secure your sensitive data, give your data users transparency and scalability, and offer real-time alerting is a critical priority.
What are the Challenges of Data Access Governance?
While ALTR’s automated, real-time features take the stress out of implementing, scaling, and monitoring a data access governance strategy, some organizations may face challenges when it comes to defining roles for policy management.
- Ensuring all stakeholders are on the same page: Before any policy can be created, data can be governed, or access can be monitored, all stakeholders must be on the same page.
- Determining access levels: Determining which roles or departments should have access to what data, and how much access they should have involves initial legwork of enforcing a hierarchy of status when it comes to data access. Prior to setting the parameters of role-based access or tag-based access, there needs to be clearly defined guidelines and agreement on access levels.
- Setting clear expectations: After the initial leg work is done to ensure a successful data access governance implementation, it’s critical to continue ongoing conversation to minimize the risk of responsibilities slipping through the cracks. We recommend pre-determining who will lead the charge in maintaining good data hygiene.
Once all parties are on the same page prior to initial implementation, ALTR makes creating, enforcing, and monitoring policy simple and effective.
What Industries are Deploying Data Access Governance?
Data Access Governance is a crucial aspect of data safeguarding across all organizations and all industries. Industries such as finance, healthcare, and retail are just a few examples of those who should be implementing Data Access Governance.
- Financial Services – By controlling who has access to what data, financial institutions can prevent data breaches and unauthorized use of customer information. Additionally, implementing data access governance can help financial services organizations meet regulatory requirements such as PCI-DSS and GDPR, while emphasizing protecting their members data. ALTR allows FinServ organizations the ability to quickly classify data, set policy around data, and see real-time audits of their protected information.
“Helping people navigate their financial journeys is the mission of TDECU, a Texas-based credit union with more than 366,000 members and $4.7 billion in assets. TDECU relies on large amounts of data to understand its members, ensure excellence across banking and operations, and improve the member experience.
Leveraging ALTR for automated policy enforcement, in tandem with Snowflake’s integrated security features, aligned with TDECU’s need for transparency, compliance, and control. Tokenization-as-a-service, data masking, thresholding, and integration with enterprise data governance solutions, including Collibra, were a few reasons why TDECU chose ALTR.”
Read more about why financial service organizations are choosing ALTR over others: https://www.altr.com/resource/tdecu-takes-data-driven-approach-supporting-members-financial-journeys
- Healthcare – It is crucial for healthcare companies to take essential measures like data access governance to ensure the privacy and security of their patients' personally identifiable information (PII) data. ALTR enables healthcare institutions to control data access and prevent data breaches and unauthorized use of patient information in real-time. By utilizing Data Access Governance, healthcare companies can easily meet regulatory requirements such as HIPAA and GDPR and ensure their patients information remains secure.
- Retail – Retail corporations are in charge of storing and securing the sensitive information of their customers- from shipping addresses to email addresses and occasionally credit card numbers. In order for retailers to ensure their customer’s PII is secure, they must implement a complete Data Access Governance solution. ALTR’s ability to set masking policies easily and with no-code allows retail corporations the ability to maintain a high level of security and quickly scale policy as needed.
“One of ALTR’s Enterprise customers, a multinational privately owned fast-fashion retail corporation with a direct-to-consumer presence, recognized the need to correctly store and protect the sensitive data entrusted to them. This corporation is responsible for over 60 million customer email addresses, mailing addresses and names.
After discussing the customer’s business goals, ALTR rolled out a two-step plan to accomplish the retailer’s data governance needs, starting with a custom masking policy on customer PII and following that with access controls.”
Read more about how ALTR helps retail organizations secure their sensitive data: https://www.altr.com/resource/case-study-multinational-retailer-secure-customer-pii.
What are the components of a successful Data Access Governance Strategy?
Understanding What Data You Have
Classifying your data is one of the most critical parts of beginning to protect sensitive data. The process of classifying your data allows you to begin to understand what data you have access to and identify what of that data is highly sensitive. Understanding what data is sitting in your database and identifying the columns that exist with sensitive data, puts you in a healthy position to begin setting policy and creating access controls.
Creating Policy Around Who Can Access What Data, at What Frequency
- Locks: Once you have a grasp on what data exists in your database, you can begin setting policy to ensure your data is secure and is protected from breach. ALTR’s Locks allow you to configure roles that are allowed access to data and how they are permitted to consume that data. These locks function on a least privileged access model, ensuring that even if a manual error is made, your data still remains secure. When data is queried for if your database, depending on the lock set and the person running the query, the data can return in no mask, partial mask, or full mask, dependent on the access control set.
- Thresholds: Just because a certain user group should be able to access data, doesn’t always mean that they should have unlimited access to that data. ALTR’s patented rate-limiting capabilities is key to a successful Data Access Governance strategy. Threshold alerting allows you to create policy around how many data values are being queried for and at what frequency or time of day. Thresholds allow the data owner to take the sensitive data combined with the lock and prescribe how that data can be consumed. ALTR’s real-time alerting capabilities can log that a threshold is happening or block the query altogether – giving you real-time access to know what is happening with your data at scale.
Data Usage Heatmaps & Query Audits
Protecting sensitive data – query audits & data usage heatmaps
Once your key protection measures are put into place, continuous monitoring and managing the way data is being used is critical for your Data Access Governance plan. A quick and accurate way to view data access and data usage, will ensure your organization is ahead of the curve on the front of securing sensitive data.
ALTR’s Data Usage Heatmaps show a simple view of the relationship between the roles that access data, and how much of the data is being consumed. The heatmap (shown below) offers drill down capabilities, giving you the flexibility to see activity that makes up the aggregation of data usage. By understanding who is accessing what data and at what frequency, you can baseline normal data usage for your organization and create policy around that.
Data has become the most valuable asset for businesses and organizations. Because of this, it is essential to have proper data security measures in place to protect sensitive information from unauthorized access and misuse. Whether for PCI compliance, GDRP regulations, or the many other reasons people choose to begin securing their data, Data Access Governance is crucial to your organization’s strategy to protect sensitive data.
Aug 2
0
min
3 Dimensions to Determining the Right Role-Based Access Controls
ALTR Blog
While there are lots of rules around how data should be protected at rest to prevent theft or breaches, the real threat to data comes from letting people use it. In fact, the easiest way to make data safe is to put it in a vault and throw away the key. But that defeats the purpose of storing data in the first place: using it to gain insight. Your highest risk and highest reward come at the intersection of users and data. That’s why role-based access controls are so critical to both data usage and data security.
What is Role-Based Access Control?
Role-based access controls restrict access to sensitive data with policies associated with various roles. These roles could be job function, job level, department, region or more. Rather than setting access by individual user or tool, these roles are set up and assigned specific permissions based on what someone in that role needs to access to do their job. A marketing person may have one set of permissions while a finance team member may have another, or Admins might have higher level access than line of business users. When new users come on board or people change jobs, the only change required is what roles they’re assigned to. This makes data access easier to manage and less error prone.
3 Strategies for RBAC
While an RBAC approach makes a lot of sense, one of the biggest challenges we see our customers facing is determining the framework they should use. Should their roles be set by department? Should it be by job level? Based on what we’ve seen work at our customers, we recommend starting with understanding how volatile your company’s business environment is: specifically, how much and how often the data, the rules around it, and which users need it will change over time.
With this strategy in mind, here are three frameworks we’ve seen successfully help companies manage the data and user intersection via role-based access controls.
RBAC Case One: Your Data, Rules and Users Rarely Change – Set It and Forget It
For smaller companies in a more static industry, all three of the variables might not be very variable. For example, a regional bank might be looking at the same kinds of data consistently over time: who logged into the banking portal, how many payments went out, and how many ATM withdrawals there were, from which ATMs? Because they’re not rolling out new product lines or other drivers of new data very often, the types of data they analyze to run their business don't change often.
And because it’s the financial services industry, the banking rules around data security and governance are rigidly structured, specific, and slow to change. It's rare that a new regulation around the care of personal financial data rolls out in the US. Finally, in some part because of the size of the company and focused use of the data, the data users don’t change – it’s the same 5 to 10 data analysts running the same numbers daily or weekly.
In this scenario, a company can have a pretty straightforward RBAC configuration that doesn't require advanced data classification or tagging. The company can focus on well-defined “role-to-data” relationships.
For example, all PII data could be controlled, but the way that it is masked and the amount shared is determined by the role of the person accessing it. Minimal and straightforward policies could be set for how each specific role can access data:
- Marketing role has access to all data but it’s masked
- Data scientists have access to unmasked data but only 2,000 records per day
- Administrative users could have access to unmasked data as well, but only 20 records per day
Active Directory can be integrated with Snowflake to share the role data.
RBAC Case Two: Your Data and Users Change Often but the Rules Don’t – This is Manageable
In a more dynamic industry or in a company more mature in its data lifecycle journey, there can be more variation in data and in the users needing the data, while the rules themselves don't change much. For example, a company may be bringing in different types of data from across the company, like payroll or shipping costs. Or they might be moving into new lines of business that require different kinds of data like the most popular product color or busiest intersections. They may have a decentralized data process such that various product teams can classify, tag, and add data to the data warehouse, then request access.
In this scenario, a company can make the rules specific to the type of data and the type of access that should occur. For example, they could set up data access policies and then assign the policies appropriate to the roles:
- PII SSN – No Access
- PII SSN – Last Four
- PII SSN – Full Access
- PII Phone – No Access
- PII Phone – Last Four
- PII Phone – Full Access
A specific role could then be granted one or more of these policies. Sales may get PII Phone – Full Access + PII SSN No Access.
As data is loaded into Snowflake, it is classified in real-time and brought in with that classification such that it fits into one of these roles via how it's tagged. Companies can then use Okta or Active Directory to assign these policies to roles.
This means that as the data changes, it's classified in a variable way, and as the users change, whether they're new users or existing users gaining or shedding roles and responsibilities, they're added to new roles and policies in a variable way. The policies, however, are set just once because the rules around which kinds of data are sensitive and how it should be controlled don't change.
This is the most scalable approach to access control.
RBAC Case Three: Everything is Changing – Try to Keep Up
Unfortunately, not every company can fit into the previous scenarios. The third situation is the most challenging: The data changes constantly, the rules change constantly, and the users change constantly. We see this more often than you might think in specific types of companies: very large enterprises acquiring new companies in new markets or moving into new locations with new regulatory environments that are all very data-driven and data-focused throughout the entire business. These enterprises must deal with a trifecta of variability: new types of data coming in, new rules based on the industry and location, and new users across the company wanting and needing access. Because they're out in front at the leading edge, they're all still just figuring out how to manage all these moving parts.
In this case, a user may need to switch out their role multiple times throughout the day and hence access depending on what team they're working with and the hat they're wearing. A data engineer, for example, might be helping the sales team with something, and then the next fire to put out is with the data science team. Their functional role might be data quality engineer, and within that function, the user may be an admin for some data sets but just a data consumer for others; for example, the user could be an account admin for marketing because they’re GDPR-certified but a read-only user for finance because they don’t have a Series 7 and can’t see customer income statements.
Because it's challenging to set up static rules in this scenario, a hierarchy structure allows the RBAC to scale by placing policy over both functional roles and technical roles. Instead of making (and updating) a ridiculous number of separate roles, a data team can use that custom logic to evaluate the user when they’re running a query (what hat are they wearing when running the query?) and the classification of the data they’re trying to access. They can write about 8 or 10 lines of code that evaluates this dynamically and applies the correct access level for the role they’re playing at the time.
Role Hierarchy:
Conclusion:
The key to an effective role-based access control structure is understanding the fundamental forces affecting your data. Every business is different in the way that it consumes, stores, and processes data; in the way in which it follows regulations or defines internal policies; and in how it onboards, offboards, and categorizes its users. Those three dimensions can be unique to every organization but will generally fall into one of the above categories.
Starting from one of these as a foundation will help ensure your access controls are scalable and manageable for your business environment, and more than anything else, secure.
Jul 19
0
min
What is Data Masking: An Expert Guide to Safeguarding Your Sensitive Data
ALTR Blog
In the realm of modern enterprises, safeguarding sensitive data is paramount. Data breaches and regulatory compliance challenges loom large, demanding robust solutions. Fear not, for data masking emerges as the agile and versatile knight in shining armor, equipping organizations with the power to shield their precious data assets while maintaining usability and compliance.
This guide delves into the dynamic world of data masking, exploring the factors driving its adoption, the different types of masking available and critical technique selection considerations for successful implementation.
But First, What is Data Masking?
Data masking is a data protection technique involving transforming or obfuscating sensitive information within an organization's databases or systems. It aims to conceal or alter the original data to render it unreadable while maintaining its functional and logical integrity.
By replacing sensitive data with fictitious or anonymized values, data masking safeguards individuals' privacy, mitigates the risk of unauthorized access or data breaches and ensures compliance with data protection regulations. This process enables you to maintain data usability for various purposes, such as testing, development, analytics, and collaboration, while minimizing the exposure of sensitive information.
Why is Data Masking Important?
Compliance with Data Protection Regulations
Companies are often required to comply with data protection regulations such as the General Data Protection Regulation (GDPR), Health Insurance Portability and Accountability Act (HIPAA), and Payment Card Industry Data Security Standard (PCI DSS). Data masking helps you meet these regulatory requirements by protecting sensitive data and ensuring privacy.
Safeguarding Sensitive Information
Companies possess a vast amount of sensitive data, including personally identifiable information (PII), personal health information (PHI), financial records like credit card numbers (PCI), intellectual property, and trade secrets. Data masking allows you to protect this information from internal and external unauthorized access. By masking sensitive data, you can limit exposure and prevent data breaches.
Mitigating Insider Threats
Insider threats are a significant concern for companies. Employees, contractors, or partners with legitimate access to sensitive data may intentionally or accidentally misuse or disclose it. Data masking restricts the visibility of sensitive data, ensuring that only authorized individuals can view authentic information. This reduces the risk of insider threats and unauthorized data leaks.
Minimizing Data Breach Risks
Data breaches can lead to significant financial and reputational damage for companies. By masking sensitive data, the stolen or leaked data will be useless or significantly less valuable to attackers, even if a breach occurs. Masked data does not reveal original values, reducing the impact of a violation and protecting the privacy of individuals.
Creating Secure Non-Production Environments
Companies often use non-production environments for development, testing, or training purposes. These environments may contain sensitive data copies from production systems. Data masking ensures that sensitive information is replaced with realistic but fictional data, eliminating the risk of exposing real customer or employee information in non-production environments.
Enabling Data Sharing and Collaboration
Data masking allows you to securely share sensitive data with third parties, partners, or researchers. By masking the data, you can maintain privacy while still allowing data analysis, research, or collaborative efforts without compromising the confidentiality of the information.
Preserving Data Utility
Data masking techniques aim to balance data privacy and data utility. Companies must ensure that masked data remains usable for various purposes, including application development, testing, data analytics, and reporting. You can protect data using appropriate masking techniques while retaining its value and usefulness.
Types of Data that Require Data Masking
Various types of data can benefit from data masking to ensure privacy and security including:
- Personally Identifiable Information (PII): This includes names, addresses, social security numbers, passport numbers, and driver's license numbers.
- Financial Data: Credit card numbers (or PCI), bank account details, and financial transaction records are sensitive data that warrant data masking.
- Healthcare Information: Protected Health Information (PHI) like medical records, patient diagnoses, treatment details, and health insurance information must be masked to comply with regulations like HIPAA.
- Human Resources (HR) Data: Employee records, salary information, and employee identification numbers may require data masking to protect privacy and prevent identity theft.
- Customer Data: Customer names, contact information, purchase history, and loyalty program details should be masked to safeguard customer privacy.
- Intellectual Property: Trade secrets, patents, research and development data, and proprietary information should be masked to prevent unauthorized access and maintain competitiveness.
Types of Data Masking
Static Data Masking
Static data masking permanently replaces sensitive data with fictitious or anonymized values in non-production environments. It aims to provide realistic but de-identified data that can be used for testing, development, training, or sharing purposes while preserving privacy and security.
Static data masking typically operates on a copy of the original dataset, where sensitive information, such as personally identifiable information (PII) or financial details, is masked with fictional equivalents. This process ensures that the masked data retains the same structure, format, and relationships as the original data while rendering the sensitive information unreadable and meaningless to unauthorized individuals.
Dynamic Data Masking
Dynamic data masking allows real-time masking of sensitive data at the point of access based on user roles and permissions. With dynamic data masking, the sensitive data remains stored in its original form. Still, it is dynamically masked or obfuscated when queried or accessed by users who do not have the necessary privileges.
This technique provides fine-grained control over data exposure, ensuring that individuals only see the masked data they are authorized to access. Dynamic data masking helps prevent unauthorized users from viewing or accessing sensitive information while allowing authorized users to interact with the data in its unmasked form.
Deterministic Data Masking
Deterministic data masking is a data protection technique where sensitive data is consistently transformed into the same masked output value using a predefined algorithm or function. Unlike other masking methods that introduce randomness or variability, deterministic data masking ensures that the same input value will always result in the same masked value.
This approach is instrumental when data relationships, referential integrity, or consistency must be maintained across different systems or environments. However, it is essential to consider potential privacy and security risks associated with deterministic data masking, as the consistent masking pattern could potentially be exploited through reverse engineering or pattern recognition techniques, necessitating additional safeguards to protect sensitive information.
Data Masking Techniques
When we use the term “data masking” by default we’re often referring to the practice of replacing some numbers in a string with asterisks – such as an email address like ****@altr.com. However, data masking can actually refer to a wide range of techniques for obfuscating and anonymizing data. Here are a few data masking techniques.
Format-Preserving Encryption
Format Preserving Encryption (FPE) allows data to be encrypted while retaining its original format, such as length or data type. It ensures compatibility with existing systems and processes, making it useful for protecting sensitive data without extensive modifications. FPE can be deterministic or randomized, providing consistent or variable ciphertext for the same input. It is commonly used when preserving data format is crucial, such as encrypting credit card numbers or identification codes while maintaining their structure.
Data Tokenization
Data tokenization replaces sensitive data with unique tokens or surrogate values. Unlike encryption, where data is transformed into ciphertext, tokenization generates a token with no mathematical relationship to the original data. The token serves as a reference or placeholder for the sensitive information, while the actual data is securely stored in a separate location called a token vault. Tokenization ensures that sensitive data is never exposed, even within the organization's systems or databases.
Data Scrambling
Scrambling involves shuffling or rearranging the characters or values within a data field, rendering it unreadable without affecting its overall structure. This technique is commonly used for preserving data integrity while masking sensitive information.
For example, consider a dataset containing employee salary information. With data scrambling, the original values within the "Salary" field are shuffled or rearranged in random order. For instance, an employee with a salary of $50,000 might be masked as $80,000, while another employee's salary of $75,000 could become $35,000. The resulting scrambled values retain the structure of the data but make it challenging to associate specific salaries with individuals.
Data Substitution
Substitution replaces sensitive data with fictitious values, ensuring that the overall format and characteristics of the data remain intact. Examples include replacing names, addresses, or phone numbers with random or fictional counterparts.
Data Shuffling
Data Shuffling rearranges sensitive information randomly, breaking the relationship between values while preserving data structure. For example, imagine a dataset containing customer information, including names and addresses. With data shuffling, the original values within each field are scrambled, resulting in a randomized order. For instance, the name "John Smith" might become "Smith John," and the address "123 Main Street" could transform into "Street Main 123."
Value Variance
Value variance adds an element of unpredictability to the masking process. It ensures that the resulting masked value varies across instances even when the same original value is encountered. For example, a social security number "123-45-6789" might be masked as "XXX-XX-XXXX" in one instance and "555-55-5555" in another. By introducing this variability, value variance thwarts attempts to correlate masked data, making it significantly more challenging for unauthorized individuals to uncover sensitive information.
Nulling Out
Nulling out replaces sensitive information with null or empty values, removing any trace of the original data. This technique is beneficial when sensitive information is not required for specific use cases, such as non-production environments or scenarios where privacy is a top concern. Nulling out eliminates sensitive data, minimizing the risk of accidental exposure or unauthorized access.
Pseudonymization
Pseudonymization replaces sensitive data with pseudonyms or surrogate values. The pseudonyms used in the process are typically unique and unrelated to the original data, making it challenging to link the pseudonymized data back to the original individuals or sensitive information.
For example, healthcare data might contain a patient's name, "John Smith," address, "123 Main Street," and medical record, "PatientID: 56789." Through pseudonymization, the organization replaces these values with unique and unrelated pseudonyms. For instance, the patient's name could be pseudonymized as "Pseudonym1," the address as "Pseudonym2," and the medical record as "Pseudonym3." These pseudonyms are consistent for a particular individual across different records but are not directly linked to their original data.
How to Determine Which Data Masking Technique is Right for You
When determining which data masking technique to apply, several factors should be considered:
Data Sensitivity
First things first, you must understand the sensitivity of the data being masked. Identify the specific data elements that need protection, such as personally identifiable information (PII), financial data, or healthcare records. This assessment helps determine the level of masking required and guides the selection of appropriate techniques.
Regulatory and Compliance Requirements
Consider the relevant data protection regulations and compliance standards that govern the data. Different regulations may have specific requirements for data masking or anonymization. Ensure that the chosen technique aligns with the regulatory obligations applicable to the data.
Data Usage and Usability
Evaluate how the data will be used and the level of functionality required. Consider the intended application, such as testing, development, analytics, or research. The selected technique should preserve the usability and integrity of the data while protecting sensitive information.
Data Relationships and Dependencies
Assess the data relationships and dependencies within the dataset. Determine if any referential integrity constraints, foreign critical dependencies, or relational dependencies need to be maintained. The chosen technique should preserve these relationships while masking sensitive data.
Performance and Scalability
Consider the performance impact and scalability of the chosen technique. Some masking techniques may introduce additional processing overhead, impacting system performance or response times. Evaluate the system's capacity to handle the masking process effectively and efficiently, especially for large datasets or complex queries.
Security and Access Controls
Evaluate the security requirements and access controls associated with the data. Consider the level of granularity needed to control access to masked data. Some techniques, such as dynamic data masking, provide fine-grained control over data exposure based on user roles and permissions.
Data Retention and Data Lifecycle
Assess the data retention policies and the lifecycle of the data. Determine if the masked data needs to be retained for a specific period and if there are any data destruction or archival requirements. Consider how the chosen technique aligns with the data retention and lifecycle requirements.
Cost and Resources
Evaluate the cost and resource implications of implementing the chosen masking technique. Some techniques may require specialized tools or resources for implementation and maintenance. Consider the budgetary constraints and resource availability within the organization.
Wrapping Up
In a world where data is king and privacy is paramount, data masking emerges as the unsung hero in data security. It's the guardian of sensitive information, the gatekeeper against breaches, and the enabler of trust in an interconnected landscape. With a careful blend of innovation and best practices, data masking allows organizations to dance the delicate tango of privacy and usability, ensuring data remains safe while retaining its functionality.
Jul 11
0
min
ALTR’s Integration Tokenizes Data Automatically in the Matillion Data Pipeline
ALTR Blog
Today’s business environment has no time for silos or lack of collaboration. This challenge is coming to a head at the intersection of data and security. Data teams focus on terms like “quality, accuracy, and availability,” while security teams care about “confidentiality, integrity, and risk reduction.” This often means Data teams want “real-time access” at the same time that Security teams require “real-time security.” But the truth is that both actually have the same goal: extracting maximum business value from the data.
Integrated Security = Streamlined Value
After teams realize they have the same goal, the next step is to converge around shared tools and processes. In many companies, data moves at the speed of the business and increasingly this means at the speed of the cloud. In order to keep up, both data and security teams need tools that have been built for that speed and delivery. The data productivity cloud from Matillion combined with ALTR’s SaaS data access governance and data security platform can be the shared tool set needed to deliver this streamlined value.
Integrated Data Stack
Figure 2 below might look like a complicated data ecosystem, but the point is that it’s actually not. Think about your own stack - this probably looks pretty familiar. That’s because you undoubtedly have a lot going on with the data in your business. What you should notice in this diagram is that Snowflake remains at the center, and everything else works around it. ALTR integrates and works with existing tools to deliver security so that it doesn’t disrupt or interfere with your existing stack – BI tools, Data apps, custom code.
Matillion is a platform to help you get your data business ready faster. It does this by Moving data, Transforming data and Orchestrating the data Pipelines.
ALTR Tokenization + Access Control + Real Time Alerting
The killer integration to solve for the shared responsibility of CDO and CISO is tokenization + access control + alerting. The ability to ensure privacy, security, and governance are all addressed with a single technology is key. This can be achieved with tokenization plus access control for integrated policy enforcement.
• Data is classified and tagged from ingestion point
• Data is automatically protected at ingestion point based on tag-based policy
• RBAC and other Governance Access items configured once
• Data lands in Snowflake ready to be queried according to policy. No SnowSQL. No SDLC. It just works.
Operationalize Tokenization + Access Control + Real Time Alerting on Matillion Data Productivity Cloud
With ALTR’s new integration for the Matillion Data Productivity cloud, set up is easy – ALTR is natively integrated with Matillion. This native integration is currently a proof-of-concept but will be live on the new Matillion Data Productivity Cloud very soon!
Data classification done by ALTR is set up in Matillion. Then that data is tokenized based on those classification tags automatically when it lands in Snowflake - on the fly. For example, if Social Security numbers are found during the classification process, columns are tagged with SSN, and if the policy requires that data be tokenized, it will be done automatically. This helps to satisfy data security requirements natively in your data pipeline. De-tokenization rules are based on the user, and it doesn’t matter where the user accesses the data from – from Snowflake UI or in Matillion – ALTR’s data access governance policy is applied because data is sitting in Snowflake in its tokenized form. Data teams appreciate this as they want to access the data as soon as it's available in Snowflake. With tokenization + access control, both teams are getting what they need from the already invested tool sets.
It also doesn’t matter which data source the data originates from – RDS, Workday, SAP, Salesforce. Wherever you’re pulling data from, new data is flowing into the pipeline is categorized, tagged and tokenized based on the pre-set policies around data types. That means whenever data teams want to add another data source, it will be secured.
As this data is accessed, security teams continue to receive customized access history logs which can be configured to alert when certain types of access occur. This access might be outside working hours, across many different data types, or a larger than normal request of sensitive data from a user or a role. Security teams can be certain that only appropriate access is occurring, and data teams know what guardrails they need to operate within.
This solution removes all the bottlenecks of migrating data by doing the hard security stuff required automatically. It also means data teams can stop thinking about security and focus on other data issues like quality and continued migration while knowing they’re meeting the requirements of their partners on the security team.
Full tokenization plus policy integrated directly into your ETL pipeline regardless of the source – no one else makes securing your data migration this easy.
See it in action…
Try Matillion Free...
and ALTR Free Today!
Jun 23
0
min
Bad Habits Data Governance and Data Security Teams Need to Break
ALTR Blog
As part of our Expert Panel Series on LinkedIn, we asked experts in the modern data ecosystem what they think is one bad habit data governance teams and data security teams should break? Here’s what we heard…
James Beecham, Founder & CEO @ ALTR
Believing there is a wall between the two teams. More and more governance and security are becoming the Business Prevention Teams(trademark) because they refuse or cannot work together. The winners going forward will have these two teams working hand-in-hand with data pipeline engineers to place active security and meaningful meta data collection to use directly in the pipeline. This means classify data as soon as you pull it from source, have automated rules to encrypt or tokenize based on classification, leverage tags and metadata to land data in the cloud data warehouse with all the necessary information to plug into the RBAC model etc. The Spiderman finger pointing memes have to end internally...
Ethan Aaron, CEO @Portable
I think security and governance have to be engrained in data teams from day 1 as non-negotiable. Otherwise, there will always be a back and forth argument over priorities.
Fred Bliss, Head of Data and AI @ 2nd Watch
Organizing efforts in large groups - especially with governance councils and people/process improvement. Start these things with a small group of people, make everyone accountable for something important, and expand it over time. There's nothing more ineffective in a steering committee than one that's so big that nobody is accountable for making the changes needed to push the organization forward.
Ethan Pack, President @ The Pack Group, Inc.
Stop taking an ivory tower approach. Similar to enterprise architecture, these practice areas affect a firm's DNA and ability to change. These things should be treated as a team sport. I really appreciate what Fred Bliss shared - there should be a small core team serving as the central pole for the big tent of data governance and security. Starting with everything and everyone is a recipe for ineffectiveness or outright failure, but the intersections and dependencies with other enterprise-shaping areas must be covered to mitigate silos and finger-pointing, to James Beecham's point.
Nick Popov, Manager Architecture and Integrations @ TDECU
Perhaps, stop treating data as a commodity and start treating it as a service.
Pat Dionne, Co-Founder, CEO @ Passerelle
Instead of starting with “no,” data governance teams should work from an enablement perspective – taking time to understand the data use and put the proper safeguards in place for governance, security, and access.
Damien Van Steenberge, Managing Partner @ Codex Consulting
Manage the rule… not the exceptions.
Be on the lookout for the next installment of our Expert Panel Series on LinkedIn this month!
Jun 21
0
min
Snowflake Summit Preview: What Happens in Vegas... Transforms the World of Data Collaboration
ALTR Blog
The ALTR Team is just a few days out from taking off to Las Vegas for Snowflake Summit 2023. Members of our team – from Product Designers, to Engineers, to Business Development Execs – are anticipating an exciting event ahead. In preparation for Summit 2023, we asked a few members of our team what they are most looking forward to.
A Consolidated Data Ecosystem
I am looking forward to seeing our whole “data” ecosystem around Snowflake in one place at Snowflake Summit. It will be a great opportunity to connect with individuals from other companies in the ecosystem. At the same time, I look forward to meeting Snowflake customers and prospects to discover their needs.
- Ami Ikanovic, Application Engineer
SnowSQL, Open-Source Drivers, and the API Ecosystem
As a software engineer dealing a lot with our Snowflake integrations, I'm looking forward to several of the Snowflake Summit speaking sessions that feature Snowflake engineers and team members. Specifically, the ones around SnowSQL improvements over the past year, what's new in data governance and privacy, open-source drivers and the Snowflake API ecosystem.
- Ryan DeBerardino, Senior Software Engineer
Advancing Partnerships
I am eagerly anticipating attending this year's Snowflake Summit for several reasons. Firstly, it offers a fantastic opportunity for collaboration with my colleagues, enabling us to exchange ideas and insights that will enhance our work. Secondly, I am excited about the prospect of meeting and engaging with peer executives from Alation, Analytics8, Collibra, Matillion, Passerelle, and Snowflake as their expertise and experiences can provide valuable perspectives for our own business strategies. Additionally, I am eager to learn about the latest technologies showcased at the Summit, as incorporating these advancements into our partner strategy will undoubtedly accelerate our growth and success.
- Stephen Campbell, VP Partnerships and Alliances
The Speed of Innovation
I am most excited to see what new features and offerings each of the different providers have produced in the last year. We work in such a dynamic and fast-moving industry, and conferences like Snowflake Summit are an amazing, concentrated way to showcase the speed of innovation.
- Ted Sandler, Technical Project Manager
ALTR + Matillion = <3
We’re anticipating Snowflake Summit 2023 to be a huge success. With the waves that ALTR has made in the data ecosystem this year, I’m excited to get to connect with our customers and partners in the greater data community. I get to share a speaking session with Laura Malins, VP of Product at Matillion on Thursday, June 29 at 11am about the convergence of Analytics, Governance and Security in West Theatre B. I am looking forward to collaborating with Laura in this session and connecting with attendees throughout the event.
- James Beecham, Founder, CEO
The Best Event of the Year
I’m looking forward to seeing, in person, the countless Snowflake customers ALTR works with day in and day out. Having the opportunity for face-to-face discussions about new challenges they are tackling and how ALTR may be able to help is what makes Snowflake Summit the best event of the year for us.
- Paul Franz, VP Business Development
ALTR-itas and After-Hours Events
I couldn’t be more excited about ALTR’s 2nd year attending Snowflake Summit and our 1st year sponsoring after-hours events. We’ll be going from dawn till dusk on Tuesday June 27 with our sponsorship of Passerelle’s Data Oasis and the big Matillion Fiesta in the Clouds. It’s such a great opportunity to meet up with some of our favorite partners and customers as we dance the night away. I hope we see you there!
- Kim Cook, VP Marketing
Customers Drive Our Roadmap
With data governance being such a hurdle for many companies, there are many speaking sessions covering this topic at Snowflake Summit. These governance sessions themselves, the questions asked in them, and the follow-up conversations that we participate in are the most interesting. We have great current and future customers that help drive what we build, but listening to a broad range of voices and seeing what everyone else is doing is what makes me very excited for Summit 2023.
- Kevin Rose, Director of Engineering
Data Governance and Accessibility
This will be my first-time attending Snowflake Summit and I’m so excited to meet and mingle with folks in person! As a product designer, it’ll be so valuable to speak with some of our users firsthand, get feedback on how ALTR’s platform has made data governance more accessible, and catch up on all the exciting developments in the Snowflake community.
-Geneva Boyett, Product Designer
We’re looking forward to all that Snowflake Summit 2023 will teach us, and we hope to connect with you there. Below are all of the places you’ll be able to find us in Vegas next week! See you soon!
Booth #2242
James Beecham + Laura Malins Speaking Session: June 29 @ 11am – West Theatre B
Data Oasis with Passerelle: June 27 @ 6pm – The Loft NV above Cabo Wabo
Fiesta in the Clouds with Matillion: June 27 @ 8pm – Chayo Mexican Kitchen and Tequila Bar
Jun 13
0
min
Govern Snowflake Data Shares with Views
ALTR Blog
ALTR is constantly looking to solve problems for our customers. A recent challenge some of our customers were running into is governing and securing a Snowflake data share. ALTR’s creative solution enables this through governing views on Snowflake data shares. Here’s how:
Challenge: Extending Governance Across Snowflake Data Shares
In Snowflake, companies may have a primary production database or even a general data source that multiple groups need. This data often sits in a Snowflake account that may be accessible to only a handful of people or none at all. Snowflake admins can “share” read-only portions of that database to different groups on different Snowflake accounts internally across the company. This enables data consumers to utilize the data without risking changes or corruption to the source data or inconsistencies across the data in different accounts. If Snowflake admins are leveraging the native Snowflake governance features to apply governing policies such as data masking or column-level access controls with SQL on the primary database, those controls will be maintained in the shared database.
However, admins offering a Snowflake data share run into the same issues every Snowflake customer faces: managing those SQL policies manually at scale. It’s time-consuming, error-prone and risky. Generally, companies going down the primary/shared database path are large enough to have hundreds of databases and thousands of users. ALTR is the best solution to implement, maintain and enforce these policies at scale in Snowflake. But because these are “shares” (i.e. read-only), Snowflake does not allow data-level protections to be applied. So, we wanted to provide our customers with the ability to extend ALTR’s enforcement across Snowflake data shares in different accounts.
Solution: Create and Govern Views on Snowflake Data Shares
Let’s start with: what is a “view” in Snowflake? You can think of it as a faux table that is defined by a query against other tables. In other words, it’s a query saved, named, and displayed as a table. For example, you could write “Select * from customer data where Country Code = US” and run that each time you need to pull a list of customers from the US. Or you could create and save a view in Snowflake based on that query and access it each time you need the same info. Creating a view is especially useful for complex queries pulling various fields from multiple databases. It allows you to just run a query against the saved view rather than writing out all the queries that go into the master query. Finally, it has the advantage of limiting the data available to table-viewers. Some companies even standardize on only offering data via views.
What this means is that Snowflake admins or even the data-consuming or data-owning line of business users (depending on your company’s approach to data management) can create a second database that can be modified in their own account and create Snowflake views on that database that reference the unmodifiable data share. The view enables limits on what data is pulled by the saved query, but ALTR can then also apply governance and data security policy on those views so that only approved data is accessible in specific formats according to the existing rules. You can set up tag-based policies, column-based access controls, dynamic data masking, and daily rate limits or thresholds as well as de-tokenize data. In other words, by governing the view, you can govern Snowflake data shares the way you would the primary database.
Share to Data Consumer – Govern on Both
Results: Secure Snowflake Data Shares
Common Snowflake data share use cases include a data source that many groups across the company need to access with different permissions: employee database, sales records, customer PII. Or sometimes, companies have a development database, a testing database, and a production database in separate accounts where the data needs to be up-to-date and consistent across them but modifiable only in one. This also makes it possible for companies that have standardized on views for all data delivery to govern those views across their Snowflake usage.
With ALTR, Snowflake admins or various account owners can leverage sensitive data shares in Snowflake and secure them from their single ALTR account. They can set up row- and column-level controls and then apply the applicable policies to the appropriate views to ensure end data share users only see the data they should, automatically.
Jun 15
0
min
Building a Modern Credit Union Data Stack on the Cloud
ALTR Blog
Just like every other industry today, credit unions are working to figure out how they can best utilize member data to optimize their experiences. But credit unions do face some unique challenges around privacy regulations, member service expectations and an often traditional hardware and software data infrastructure. We interviewed Adam Roderick, CEO @ Datateer, to learn how credit unions can make data more useful and valuable to their organizations.
How can a modern cloud-focused data architecture enable credit unions to better serve members or optimize their businesses?
Visibility and speed.
Credit unions have many different types of products and stakeholders. This necessarily means they have many processes in place, with applications and databases to support them. Each of these systems contains a partial view of the organization. A siloed slice of the whole, a glimpse.
But none of them provides a complete picture, an ability to answer questions using data from multiple sources.
Efforts like Member 360 bring data together from multiple places into a single, centralized location. This creates visibility across the entire organization! Questions that previously could not be answered, or took days (or weeks!), can now be answered real-time, on demand.
It is a magic moment when something like that comes to life. When you can see trends and compare metrics. When you can explore and get a clear picture of the credit union’s members and operations.
So much of the sluggishness in any organization is due to lack of information, or slow information. And most of that is because data is scattered across so many places.
What are the biggest challenges credit unions face when building a modern data architecture?
Focus and traction.
Regarding focus, the biggest challenges have nothing to do with technology or data. There is a tendency to want to boil the ocean–to make a large, encompassing effort. The organization buys into the big vision, and then tries to execute a huge, complicated project. You can imagine the results. I don’t think this is unique to credit unions.
On the other hand, the sheer size of potential impact and number of potential applications of data analytics can create paralysis.
The need to focus is critical to getting early wins, building momentum, and growing in maturity and capability. Think big, but take small steps, learn, and iterate.
At Datateer, we follow a framework we call Simpler Analytics. A core tenet is to treat data sources, metrics, and data products as assets–trackable things with lifecycles and measurements of their own. And it ensures each data asset is aligned with a particular audience and purpose it is intended to serve.
Regarding traction, the challenge is how to get moving and stay organized. With so many moving parts, any data architecture is at risk of becoming bloated, cumbersome, and difficult to maintain.
How many potential questions could data answer in a single credit union? How many KPIs and metrics might be part of a mature system? How many reports, dashboards, embedded analytics, or other data products?
At Datateer, we address this complexity in two ways.
To get going, we treat each new effort with a crawl-walk-run approach. Anyone surveying the modern data marketplace will quickly become overwhelmed with all the tools. But the basic modern data stack is proven and not complicated. I described an approach for this, including product recommendations, in a recent article.
As things mature, the number of reports, metrics, etc. can get out of control. This can happen relatively early. We rely again on the data asset inventory I mentioned earlier. This inventory keeps things manageable.
It can get overwhelming as reports and dashboards proliferate, more tooling gets implemented, more processes and procedures are put in place. All of these artifacts and procedures can reference the data asset inventory to stay aligned with what matters and provide a point of reference.
What data governance and security implications or requirements are especially critical to credit unions’ data modernization projects?
Security and governance boil down to ensuring data is available to only the right people, for the right reasons, at the right time.
The biggest challenge here is the balance between privacy and making good use of data. While governance and policy are essential, they slow things down. It’s necessary to find a balance between compliance and risk mitigation, with moving forward and making an impact.
Good security practices can only go so far. Up to 88% of data breaches are caused by human error. With the personal and financial data credit unions must protect, a culture and training around data protection are critical.
A solid data governance plan provides a backstop against human error.
What advice would you give on how or where credit unions should start on their data modernization journey?
Your credit union will have unique requirements, but not at the foundational level. Embrace the basic modern data stack and get moving. Don’t get hung up on defining everything up front.
Stakeholders no longer have the patience to wait months for a big project to show results. And, you don’t have to go it alone. Datateer pioneered the concept of managed analytics and fractional analytics teams. Our model allows companies to get going quickly and confidently, creating business value immediately.
Embrace the cloud. It’s where all the product and tool innovation will continue to happen. Many people get hung up on two risks: potential cost overruns and security concerns.
What I see for mid-sized credit unions is actually the opposite.
- First, scalable cloud pricing allows organizations to get into the game at a lower price point than any alternative. As you derive value from your data modernization efforts, you can gradually scale up your initiatives and expand their impact. Horror stories related to spiraling costs are not the norm.
- Second, cloud security has matured so much that in my opinion it’s better than what the typical mid-sized organization is going to be able to do on their own. Following best practices around cloud security is actually more secure than a home-grown infrastructure strategy.
What are some of the tools, solutions or partners credit unions can leverage to make their path to the cloud easier, smoother, or more secure?
I conceptualize modern data architecture as a set of components. Each has a responsibility and a set of tools and processes.
You may come across the phrase “modern data stack” which treats these components as layers that stack or build on each other.
Here is how I define the core components:
- Replication is the first piece. This is extracting data from operational systems to centralize it. Good commercial vendors are Matillion, Fivetran, and Portable, and we often use Meltano or Airbyte if we need something custom.
- The warehouse is the central place to store and analyze data. Datateer supports Snowflake and Google BigQuery (after a lot of trial and error with some others)
- Transformation is combining data from multiple sources into a single data model, shaping it for your needs, and calculating metrics. Matillion is interesting because it can do replication and transformation, simplifying a lot of your set up. dbt Labs is another solid choice
- Orchestration is coordination, scheduling, triggering, and failure notifications. Matillion handles this, and Prefect and Dagster are good options too
- Governance ensures data is usable by only the right people, for the right reasons. ALTR makes this a breeze.
- Business Intelligence is the reporting and dashboarding. We recommend Sigma Computing and Hex to cover just about every scenario. We have an evaluation matrix of almost 50 tools if you really want to get into details.
How do you predict the data landscape will change for credit unions in the next 3-5 years?
I hope that the credit union cloud adoption trend will continue. The potential benefits are huge for speed and visibility into members and product performance.
Credit unions that focus on Member 360 efforts will have such an advantage over those that don’t invest there. Member 360 is just a buzzword, but it encompasses efforts to truly understand each member by bringing together data from multiple areas of the organization.
Credit unions that do invest here will try to balance the insights data analytics can provide with the personalized service and relationships they already have with members.
Data replication will become easier. Some large vendors are already sharing data into your warehouse automatically–no replication needed.
Last, we will see generative AI become a natural language interface on top of data, allowing easier and faster access to information. This will be exciting because it opens up data to more people. But it will only be useful to credit unions who have a solid data foundation already in place.
--
Adam Roderick is CEO of Datateer, where he helps companies make their data useful and valuable. He is the creator of the Simpler Analytics framework and the founder of the Open Metrics Project. Adam lives in Colorado with his wife and five amazing daughters, raising them to live life to the fullest in one of the most beautiful places on earth.
May 30
0
min
Codex, ALTR, Matillion for Better Data Protection
ALTR Blog
Step by Step Approach to Secure Your Data
Protecting sensitive data is becoming a critical aspect of any organization’s data processes. Sensitive information, such as financial data, personal information, and confidential business information, must be kept secure to prevent unauthorized access, theft or misuse.
Of course, by implementing robust security measures and technologies, such as data loss prevention tools, network protection, and strong access controls, companies can significantly reduce the risk of a breach and protect sensitive data.
Tokenization can come on top of ‘traditional’ security measures to protect sensitive data, by physically replacing the original data at the database level using a unique identifier or token. This token can be used to revert the process to see the original data on the fly.
Sounds like masking data? Yes and no… While the data remains clear when applying a mask, tokenization physically alters the underneath data… So, it goes one step further than simply masking data.
Detokenization is the process of reversing tokenization by taking the token and returning the original data. This process is typically only done in secure systems where the data is needed for legitimate purposes, such as for a financial transaction.
Codex Consulting prioritizes protection of sensitive data and is dedicated to implementing tokenization and detokenization techniques in a straightforward manner, without the need for complex protocol.
In this blog, ALTR is the go-to solution for data security, data governance and monitoring. Matillion is the data integration and productivity tool for streamlining data pipelines and delivering promised protection to organizations. Snowflake is the Data Cloud platform on which we want to add another layer of security and protection.
Therefore, our goal is to convey our expertise on seamlessly incorporating tokenization and detokenization to secure sensitive data within your Snowflake environment.
Let’s take the example where customer emails require protection and only specific roles have access to the clear data.
These are the steps of tokenization & detokenization:
- Create an API integration.
- Create an external function of Tokenization and grant the USAGE permission on the function to the PUBLIC role.
- Create an external function of Detokenization and grant the USAGE permission on the function to the PUBLIC role.
- Create stored procedures for masking policy.
- Create an Orchestration pipeline in Matillion to invoke the tokenization and detokenization functions.
- Finally, check the data with specific roles.
Tokenization:
- For initial setup, we create an API integration "ALTR_TOKENIZATION" in the Snowflake environment.
- We create an external function called "ALTR_PROTECT_TOKENIZE", and we grant the USAGE permission on the function to the PUBLIC role, allowing any user or role to use the function. We create an external function called "ALTR_PROTECT_TOKENIZE", and we grant the USAGE permission on the function to the PUBLIC role, allowing any user or role to use the function.
- We also create an external function called "ALTR_PROTECT_DETOKENIZE", and we grant the USAGE permission on the function to the PUBLIC role, allowing any user or role to use the function.
The purpose of this function is to detokenize sensitive data that has been previously tokenized using the ALTR_PROTECT_TOKENIZE function.
- We create a stored procedure SP_MODIFY_MASKING which allows us to create a masking policy for a specific column in a table and applies different types of masking based on the value of the column. (Script in Appendix).
- Once that’s done, we can then secure the data early in the pipeline. Let’s learn how to do it through a simple Matillion job.
Let’s create a script (SQL component in Matillion) to call the Snowflake Function we just created.
We run the script below in the Snowflake environment with component SQL script and will want to choose the email data we want to protect.
- Now let’s check the email column in Snowflake. We can see that the email is now protected.
But we also want to make sure that only authorized groups can see the data in clear.
Detokenization:
As per our observations, the email data has been physically modified in Snowflake so that only specific groups can access the unencrypted data while the data remains tokenized for others. Is it possible to reverse this process and restore the data to its original state? Yes, definitely!
These are the steps in Snowflake and ALTR:
- For Initial setup of Detokenization, we run this script first in Snowflake Environment with Matillion (SQL component) to call the Detokenization function.
The script in the component SQL:
- After, we open ALTR and open the “Data Management” page under the “Data Configuration” section.
- Click the column that is added to ALTR.
- Remove this column from ALTR with the “Disconnect Column” button.
- Add your new Column to ALTR with the “Add New” button on this page.
- This column is the “PRIVACY”.”STAGING”.”CUSTOMER_DETAILS”.”EMAIL”
- Run the following command in Snowflake to configure your masking policy for automatic detokenization.
- In ALTR, open the “Locks” page under the “Data Policy” section.
- Create a lock called “Allow Detokenization.”
- Pick the “Snowflake” Application
- Pick the “SYSADMIN” role (to allow that role to see plain text values)
- Switch the “Tag” to “Column”
- Pick your new email address field.
- Set your masking policy to “No Mask.
- Now let’s check the email column in Snowflake. We can see that the email is now protected.
- With CodexAdmin role, we see scrambled data:
- With the SysAdmin role, we see plain text values.
Ultimately, tokenization and detokenization are effective and effortless using Matillion and ALTR.
The automation offered by these tools is remarkable and saves a lot of time for data engineers, allowing them to access and utilize cloud data in a matter of minutes.
May 25
0
min
Improving Your Data Governance Posture
ALTR Blog
Data governance doesn’t have a magic bullet or even a well-defined goal or end date. It’s a never-ending, ongoing process of trial, error and optimization as your business, your data and your data users change. If you’re like most practitioners responsible for governing and securing data, you’re always looking for better ways to overcome data governance challenges or ensure your systems are compliant.
As part of our Expert Panel Series, we asked experts in the modern data ecosystem what advice they’d give to companies looking to improve their data governance posture (which should be all of them). Here’s what we heard….
Ethan Pack - VP Enterprise Architecture, TDECU
"A company's data governance posture really starts with inspecting the organization's culture and identifying how data is important to the firm's short- and long-term objectives. Companies need to be honest in assessing their data literacy, talent, and data-related needs, pain points, and opportunities. These inputs are vital to establishing or updating a data governance framework that aligns to its desired posture and outcomes.
Then, it's about reinforcing the overall message of the operational and strategic value of data, addressing talent needs, and providing ongoing examples of how data governance is enabling both quick wins and more time-intensive efforts for data-led value creation and realization.
Ongoing monitoring and auditing is a must, and I'm excited to continue partnering with ALTR to help us transform static data governance policies into active, observable, and kinetic aspects of our business."
Pat Dionne - Data Entrepreneur, CEO of Passerelle
“Ethan Pack’s comment about how ALTR helps 'transform static data governance policies into active, observable, and kinetic aspects of our business' is right on.
Additionally, identify all data sources, classify data based on sensitivity and importance, and define a glossary of terms and ownership. To promote sustainability & efficiency, organizations should look at automating aspects of data governance wherever possible.”
James Beecham - Founder & CEO of ALTR, Member Forbes Technology Council
“Ensure the entire business is on board. At the Gartner Data & Analytics conference earlier this year, I heard someone from a very large company say 'Data governance has become a third rail because not everyone at the table cares' and then continued on to declare why she thoughts other stakeholders didn’t care. It all came back to leadership. Ensure your leadership is empowering and communicating the needs of the business to everyone at the same time, in the same room.”
John Bagnall - Senior Product Manager, Matillion
"Automate data controls as much as possible!"
Watch out for the next monthly installment of our Expert Panel Series on LinkedIn!
Get the latest from ALTR
Subscribe below to stay up to date with our team, upcoming events, new feature releases, and more.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.