BLOG SPOTLIGHT
ALTR’s Format-Preserving Encryption, powered by FF3-1 algorithm and ALTR’s trusted policies, offers a comprehensive solution for securing sensitive data.
Read more
Popular
Oct 30
0
min
Format-Preserving Encryption: A Deep Dive into FF3-1 Encryption Algorithm
In the ever-evolving landscape of data security, protecting sensitive information while maintaining its usability is crucial. ALTR’s Format Preserving Encryption (FPE) is an industry disrupting solution designed to address this need. FPE ensures that encrypted data retains the same format as the original plaintext, which is vital for maintaining compatibility with existing systems and applications. This post explores ALTR's FPE, the technical details of the FF3-1 encryption algorithm, and the benefits and challenges associated with using padding in FPE.
What is Format Preserving Encryption?
Format Preserving Encryption is a cryptographic technique that encrypts data while preserving its original format. This means that if the plaintext data is a 16-digit credit card number, the ciphertext will also be a 16-digit number. This property is essential for systems where data format consistency is critical, such as databases, legacy applications, and regulatory compliance scenarios.
Technical Overview of the FF3-1 Encryption Algorithm
The FF3-1 encryption algorithm is a format-preserving encryption method that follows the guidelines established by the National Institute of Standards and Technology (NIST). It is part of the NIST Special Publication 800-38G and is a variant of the Feistel network, which is widely used in various cryptographic applications. Here’s a technical breakdown of how FF3-1 works:
Structure of FF3-1
1. Feistel Network: FF3-1 is based on a Feistel network, a symmetric structure used in many block cipher designs. A Feistel network divides the plaintext into two halves and processes them through multiple rounds of encryption, using a subkey derived from the main key in each round.
2. Rounds: FF3-1 typically uses 8 rounds of encryption, where each round applies a round function to one half of the data and then combines it with the other half using an XOR operation. This process is repeated, alternating between the halves.
3. Key Scheduling: FF3-1 uses a key scheduling algorithm to generate a series of subkeys from the main encryption key. These subkeys are used in each round of the Feistel network to ensure security.
4. Tweakable Block Cipher: FF3-1 includes a tweakable block cipher mechanism, where a tweak (an additional input parameter) is used along with the key to add an extra layer of security. This makes it resistant to certain types of cryptographic attacks.
5. Format Preservation: The algorithm ensures that the ciphertext retains the same format as the plaintext. For example, if the input is a numeric string like a phone number, the output will also be a numeric string of the same length, also appearing like a phone number.
How FF3-1 Works
1. Initialization: The plaintext is divided into two halves, and an initial tweak is applied. The tweak is often derived from additional data, such as the position of the data within a larger dataset, to ensure uniqueness.
2. Round Function: In each round, the round function takes one half of the data and a subkey as inputs. The round function typically includes modular addition, bitwise operations, and table lookups to produce a pseudorandom output.
3. Combining Halves: The output of the round function is XORed with the other half of the data. The halves are then swapped, and the process repeats for the specified number of rounds.
4. Finalization: After the final round, the halves are recombined to form the final ciphertext, which maintains the same format as the original plaintext.
Benefits of Format Preserving Encryption
Implementing FPE provides numerous benefits to organizations:
1. Compatibility with Existing Systems: Since FPE maintains the original data format, it can be integrated into existing systems without requiring significant changes. This reduces the risk of errors and system disruptions.
2. Improved Performance: FPE algorithms like FF3-1 are designed to be efficient, ensuring minimal impact on system performance. This is crucial for applications where speed and responsiveness are critical.
3. Simplified Data Migration: FPE allows for the secure migration of data between systems while preserving its format, simplifying the process and ensuring compatibility and functionality.
4. Enhanced Data Security: By encrypting sensitive data, FPE protects it from unauthorized access, reducing the risk of data breaches and ensuring compliance with data protection regulations.
5. Creation of production-like data for lower trust environments: Using a product like ALTR’s FPE, data engineers can use the cipher-text of production data to create useful mock datasets for consumption by developers in lower-trust development and test environments.
Security Challenges and Benefits of Using Padding in FPE
Padding is a technique used in encryption to ensure that the plaintext data meets the required minimum length for the encryption algorithm. While padding is beneficial in maintaining data structure, it presents both advantages and challenges in the context of FPE:
Benefits of Padding
1. Consistency in Data Length: Padding ensures that the data conforms to the required minimum length, which is necessary for the encryption algorithm to function correctly.
2. Preservation of Data Format: Padding helps maintain the original data format, which is crucial for systems that rely on specific data structures.
3. Enhanced Security: By adding extra data, padding can make it more difficult for attackers to infer information about the original data from the ciphertext.
Security Challenges of Padding
1. Increased Complexity: The use of padding adds complexity to the encryption and decryption processes, which can increase the risk of implementation errors.
2. Potential Information Leakage: If not implemented correctly, padding schemes can potentially leak information about the original data, compromising security.
3. Handling of Padding in Decryption: Ensuring that the padding is correctly handled during decryption is crucial to avoid errors and data corruption.
Wrapping Up
ALTR's Format Preserving Encryption, powered by the technically robust FF3-1 algorithm and married with legendary ALTR policy, offers a comprehensive solution for encrypting sensitive data while maintaining its usability and format. This approach ensures compatibility with existing systems, enhances data security, and supports regulatory compliance. However, the use of padding in FPE, while beneficial in preserving data structure, introduces additional complexity and potential security challenges that must be carefully managed. By leveraging ALTR’s FPE, organizations can effectively protect their sensitive data without sacrificing functionality or performance.
For more information about ALTR’s Format Preserving Encryption and other data security solutions, visit the ALTR documentation
Oct 22
0
min
The CISO's Dilemma: Securing Snowflake's ACCOUNTADMIN Role
For years (even decades) sensitive information has lived in transactional and analytical databases in the data center. Firewalls, VPNs, Database Activity Monitors, Encryption solutions, Access Control solutions, Privileged Access Management and Data Loss Prevention tools were all purchased and assembled to sit in front of, and around, the databases housing this sensitive information.
Even with all of the above solutions in place, CISO’s and security teams were still a nervous wreck. The goal of delivering data to the business was met, but that does not mean the teams were happy with their solutions. But we got by.
The advent of Big Data and now Generative AI are causing businesses to come to terms with the limitations of these on-prem analytical data stores. It’s hard to scale these systems when the compute and storage are tightly coupled. Sharing data with trusted parties outside the walls of the data center securely is clunky at best, downright dangerous in most cases. And forget running your own GenAI models in your datacenter unless you can outbid Larry, Sam, Satya, and Elon at the Nvidia store. These limits have brought on the era of cloud data platforms. These cloud platforms address the business needs and operational challenges, but they also present whole new security and compliance challenges.
ALTR’s platform has been purpose-built to recreate and enhance these protections required to use Teradata for Snowflake. Our cutting-edge SaaS architecture is revolutionizing data migrations from Teradata to Snowflake, making it seamless for organizations of all sizes, across industries, to unlock the full potential of their data.
What spurred this blog is that a company reached out to ALTR to help them with data security on Snowflake. Cool! A member of the Data & Analytics team who tried our product and found love at first sight. The features were exactly what was needed to control access to sensitive data. Our Format-Preserving Encryption sets the standard for securing data at rest, offering unmatched protection with pricing that's accessible for businesses of any size. Win-win, which is the way it should be.
Our team collaborated closely with this person on use cases, identifying time and cost savings, and mapping out a plan to prove the solution’s value to their organization. Typically, we engage with the CISO at this stage, and those conversations are highly successful. However, this was not the case this time. The CISO did not want to meet with our team and practically stalled our progress.
The CISO’s point of view was that ALTR’s security solution could be completely disabled, removed, and would not be helpful in the case of a compromised ACCOUNTADMIN account in Snowflake. I agree with the CISO, all of those things are possible. Here is what I wanted to say to the CISO if they had given me the chance to meet with them!
The ACCOUNTADMIN role has a very simple definition, yet powerful and long-reaching implications of its use:
One of the main points I would have liked to make to the CISO is that as a user of Snowflake, their responsibility to secure that ACCOUNTADMIN role is squarely in their court. By now I’m sure you have all seen the news and responses to the Snowflake compromised accounts that happened earlier this year. It is proven that unsecured accounts by Snowflake customers caused the data theft. There have been dozens of articles and recommendations on how to secure your accounts with Snowflake and even a mandate of minimum authentication standards going forward for Snowflake accounts. You can read more information here, around securing the ACCOUNTADMIN role in Snowflake.
I felt the CISO was missing the point of the ALTR solution, and I wanted the chance to explain my perspective.
ALTR is not meant to secure the ACCOUNTADMIN account in Snowflake. That’s not where the real risk lies when using Snowflake (and yes, I know—“tell that to Ticketmaster.” Well, I did. Check out my write-up on how ALTR could have mitigated or even reduced the data theft, even with compromised accounts). The risk to data in Snowflake comes from all the OTHER accounts that are created and given access to data.
The ACCOUNTADMIN role is limited to one or two people in an organization. These are trusted folks who are smart and don’t want to get in trouble (99% of the time). On the other hand, you will have potentially thousands of non-ACCOUNTADMIN users accessing data, sharing data, screensharing dashboards, re-using passwords, etc. This is the purpose of ALTR’s Data Security Platform, to help you get a handle on part of the problem which is so large it can cause companies to abandon the benefits of Snowflake entirely.
There are three major issues outside of the ACCOUNTADMIN role that companies have to address when using Snowflake:
1. You must understand where your sensitive is inside of Snowflake. Data changes rapidly. You must keep up.
2. You must be able to prove to the business that you have a least privileged access mechanism. Data is accessed only when there is a valid business purpose.
3. You must be able to protect data at rest and in motion within Snowflake. This means cell level encryption using a BYOK approach, near-real-time data activity monitoring, and data theft prevention in the form of DLP.
The three issues mentioned above are incredibly difficult for 95% of businesses to solve, largely due to the sheer scale and complexity of these challenges. Terabytes of data and growing daily, more users with more applications, trusted third parties who want to collaborate with your data. All of this leads to an unmanageable set of internal processes that slow down the business and provide risk.
ALTR’s easy-to-use solution allows Virgin Pulse Data, Reporting, and Analytics teams to automatically apply data masking to thousands of tagged columns across multiple Snowflake databases. We’re able to store PII/PHI data securely and privately with a complete audit trail. Our internal users gain insight from this masked data and change lives for good.
- Andrew Bartley, Director of Data Governance
I believed the CISO at this company was either too focused on the ACCOUNTADMIN problem to understand their other risks, or felt he had control over the other non-admin accounts. In either case I would have liked to learn more!
There was a reason someone from the Data & Analytics team sought out a product like ALTR. Data teams are afraid of screwing up. People are scared to store and use sensitive data in Snowflake. That is what ALTR solves for, not the task of ACCOUNTADMIN security. I wanted to be able to walk the CISO through the risks and how others have solved for them using ALTR.
The tools that Snowflake provides to secure and lock down the ACCOUNTADMIN role are robust and simple to use. Ensure network policies are in place. Ensure MFA is enabled. Ensure you have logging of ACCOUNTADMIN activity to watch all access.
I wish I could have been on the conversation with the CISO to ask a simple question, “If I show you how to control the ACCOUNTADMIN role on your own, would that change your tone on your teams use of ALTR?” I don’t know the answer they would have given, but I know the answer most CISO’s would give.
Nothing will ever be 100% secure and I am by no means saying ALTR can protect your Snowflake data 100% by using our platform. Data security is all about reducing risk. Control the things you can, monitor closely and respond to the things you cannot control. That is what ALTR provides day in and day out to our customers. You can control your ACCOUNTADMIN on your own. Let us control and monitor the things you cannot do on your own.
Oct 14
0
min
How Cloud Data Security Enables Business Outcomes without Sacrificing Compliance
Since 2015 the migration of corporate data to the cloud has rapidly accelerated. At the time it was estimated that 30% of the corporate data was in the cloud compared to 2022 where it doubled to 60% in a mere seven years. Here we are in 2024, and this trend has not slowed down.
Over time, as more and more data has moved to the cloud, new challenges have presented themselves to organizations. New vendor onboarding, spend analysis, and new units of measure for billing. This brought on different cloud computer-related cost structures and new skillsets with new job titles. Vendor lock-in, skill gaps, performance and latency and data governance all became more intricate paired with the move to the cloud. Both operational and transactional data were in scope to reap the benefits promised by cloud computing, organizational cost savings, data analytics and, of course, AI.
The most critical of these new challenges revolve around a focus on Data Security and Privacy. The migration of on-premises data workloads to the Cloud Data Warehouses included sensitive, confidential, and personal information. Corporations like Microsoft, Google, Meta, Apple, Amazon were capturing every movement, purchase, keystroke, conversation and what feels like thought we ever made. These same cloud service providers made this easier for their enterprise customers to do the same. Along came Big Data and the need for it to be cataloged, analyzed, and used with the promise of making our personal lives better for a cost. The world's population readily sacrificed privacy for convenience.
The moral and ethical conversation would then begin, and world governments responded with regulations such as GDPR, CCPA and now most recently the European Union’s AI Act. The risk and fines have been in the billions. This is a story we already know well. Thus, Data Security and Privacy have become a critical function primarily for the obvious use case, compliance, and regulation. Yet only 11% of organizations have encrypted over 80% of their sensitive data.
With new challenges also came new capabilities and business opportunities. Real time analytics across distributed data sources (IoT, social media, transactional systems) enabling real time supply chain visibility, dynamic changes to pricing strategies, and enabling organizations to launch products to market faster than ever. On premise applications could not handle the volume of data that exists in today’s economy.
Data sharing between partners and customers became a strategic capability. Without having to copy or move data, organizations were enabled to build data monetization strategies leading to new business models. Now building and training Machine Learning models on demand is faster and easier than ever before.
To reap the benefits of the new data world, while remaining compliant, effective organizations have been prioritizing Data Security as a business enabler. Format Preserving Encryption (FPE) has become an accepted encryption option to enforce security and privacy policies. It is increasingly popular as it can address many of the challenges of the cloud while enabling new business capabilities. Let’s look at a few examples now:
Real Time Analytics - Because FPE is an encryption method that returns data in the original format, the data remains useful in the same length, structure, so that more data engineers, scientists and analysts can work with the data without being exposed to sensitive information.
Data Sharing – FPE enables data sharing of sensitive information both personal and confidential, enabling secure information, collaboration, and innovation alike.
Proactive Data Security– FPE allows for the anonymization of sensitive information, proactively protecting against data breaches and bad actors. Good holding to ransom a company that takes a more proactive approach using FPE and other Data Security Platform features in combination.
Empowered Data Engineering – with FPE data engineers can still build, test and deploy data transformations as user defined functions and logic in stored procedures or complied code will run without failure. Data validations and data quality checks for formats, lengths and more can be written and tested without exposing sensitive information. Federated, aggregation and range queries can still run without fail without the need for decryption. Dynamic ABAC and RBAC controls can be combined to decrypt at runtime for users with proper rights to see the original values of data.
Cost Management – While FPE does not come close to solving Cost Management in its entirety, it can definitely contribute. We are seeing a need for FPE as an option instead of replicating data in the cloud to development, test, and production support environments. With data transfer, storage and compute costs, moving data across regions and environments can be really expensive. With FPE, data can be encrypted and decrypted with compute that is a less expensive option than organizations' current antiquated data replication jobs. Thus, making FPE a viable cost savings option for producing production ready data in non-production environments. Look for a future blog on this topic and all the benefits that come along.
FPE is not a silver bullet for protecting sensitive information or enabled these business use cases. There are well documented challenges in the FF1 and FF3-1 algorithms (another blog on that to come). A blend of features including data discovery, dynamic data masking, tokenization, role and attribute-based access controls and data activity monitoring will be needed to have a proactive approach towards security within your modern data stack. This is why Gartner considers a Data Security Platform, like ALTR, to be one of the most advanced and proactive solutions for Data security leaders in your industry.
Oct 2
0
min
What is Format-Preserving Encryption & Why It’s the Missing Piece in Your Security Strategy
Securing sensitive information is now more critical than ever for all types of organizations as there have been many high-profile data breaches recently. There are several ways to secure the data including restricting access, masking, encrypting or tokenization. These can pose some challenges when using the data downstream. This is where Format Preserving Encryption (FPE) helps.
This blog will cover what Format Preserving Encryption is, how it works and where it is useful.
What is Format Preserving Encryption?
Whereas traditional encryption methods generate ciphertext that doesn't look like the original data, Format Preserving Encryption (FPE) encrypts data whilst maintaining the original data format. Changing the format can be an issue for systems or humans that expect data in a specific format. Let's look at an example of encrypting a 16-digit credit card number:
As you can see with a Standard Encryption type the result is a completely different output. This may result in it being incompatible with systems which require or expect a 16-digit numerical format. Using FPE the encrypted data still looks like a valid 16-digit number. This is extremely useful for where data must stay in a specific format for compatibility, compliance, or usability reasons.
>>>You Might Also Like: FPE vs Tokenization vs TSS
How does Format Preserving Encryption work?
Format Preserving Encryption in ALTR works by first analyzing the column to understand the input format and length. Next the NIST algorithm is applied to encrypt the data with the given key and tweak. ALTR applies regular key rotation to maximize security. We also support customers bringing their own keys (BYOK). Data can then selectively be decrypted using ALTR’s access policies.
Why use Format Preserving Encryption
FPE offers several benefits for organizations that deal with structured data:
1. Adds extra layer of protection: Even if a system or database is breached the encryption makes sensitive data harder to access.
2. Original Data Format Maintained: FPE preserves the original data structure. This is critical when the data format cannot be changed due to system limitations or compliance regulations.
3. Improves Usability: Encrypted data in an expected format is easier to use, display and transform.
4. Simplifies Compliance: Many regulations like PCI-DSS, HIPAA, and GDPR will mandate safeguarding, such as encryption, of sensitive data. FPE allows you to apply encryption without disrupting data flows or reporting, all while still meeting regulatory requirements.
When to use Format Preserving Encryption?
FPE is widely adopted in industries that regularly handle sensitive data. Here are a few common use cases:
- Healthcare: Hospitals and healthcare providers could use FPE to protect Social Security numbers, patient IDs, and medical records. It ensures sensitive information is encrypted while retaining the format needed for billing and reporting.
- Telecoms: Telecom companies can encrypt phone numbers and IMSI (International Mobile Subscriber Identity) numbers with FPE. This allows the data to be securely transmitted and processed in real-time without decryption.
- Government and Defense: Government agencies can use FPE to safeguard data like passport numbers and classified information. Preserving the format ensures seamless data exchange across systems without breaking functionality.
- Data Sharing: In this blog we talk about how FPE can help with Snowflake Data Sharing use cases.
Wrapping Up
ALTR offers various masking, tokenization and encryption options to keep all your Snowflake data secure. Our customers are seeing the benefit of Format Preserving Encryption to enhance their data protection efforts while maintaining operational efficiency and compliance. For more information, schedule a product tour or visit the Snowflake Marketplace.
Browse All
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Apr 5
0
min
Snowflake Data Observability: DIY vs ALTR
ALTR Blog
Many of us have become more aware of the power of increased knowledge around our activities over the last few years – whether it’s a FitBit monitoring our steps or an energy audit delivering a detailed view of how everyone in your home uses lights, appliances, electronics, and other things that need power. Each month your utility company monitors your usage, and the details can help you recognize ways to lower your bill and identify current problems that are making your home less energy efficient. You can do the same with data usage observability. By capturing and monitoring who is running queries on data and when, you can make informed decisions to prevent data breaches and leaks. ALTR Heat Maps and Analytics Dashboards can make this information easily viewable and digestible so you can see where the issues might be.
This blog provides a high-level explanation of what data observability is and why it’s important, how it works if you do it manually in Snowflake, and how it works if you use ALTR to automate the process. We’ve also included a few use cases and a how-to demonstration video. We hope you find it helpful!
What is Data Observability and Why is It Important?
At a high level, data observability is presenting information about how users are accessing data in an easy-to-consume visual format. Operationalizing this through data observability tools is critical to helping you understand what’s occurring and to gauging abnormal events. The payoff for using ALTR's data usage observability tools is that they provide the information needed to meet two key data security policy goals:
- Ensure that you have policies in place for all roles who access sensitive data
- Help you understand what normal access looks like because you can't identify what is abnormal without a baseline to compare to
Snowflake logs capture this access information, and the events shown can help your Data Security team spot issues and minimize time spent on bottlenecks, speed, or other problems. But this information is delivered in a plain text format that requires a lot of work to extract those insights from.
How Snowflake Data Observability Works if you DIY
Snowflake provides the foundational query history data needed for data usage analytics via Snowflake logs; however, to be useful, the data must be processed to get it in a visual form that is easy to interpret.
To do data observability manually in Snowflake, you must follow the steps below.
- Parse the SQL query text to extract a list of columns that you’d like to request and then filter it to only include columns that contain sensitive data. NOTE: This will require you to write SQL statements.
- Next, tabulate the count of records that each user has accessed each column for each minute, hour, and day of the past 24 hours. NOTE: This will require even more SQL coding.
- Last, convert the data set into an interactive visual chart that will display the information in a more understandable format to view the results and drill through them. NOTE: This will require full stack development skills to implement.
As you can see from the steps that are required, data observability done manually will require more time and lots of coding.
How Snowflake Data Observability Works Using ALTR to Automate
ALTR's Dashboard provides a high-level view of everything that’s happening in ALTR. For example, it will show you how many locks and thresholds you have, open anomalies, databases that are connected, columns that are governed, and other detailed data.
The built-in ALTR Heatmap (also what we refer to as ‘Data Analytics’ or ‘Query Analytics’) delivers data observability in a visual representation of how your users are accessing data. It shows you the roles and specific users in those roles who are querying different data sets. The analytics will give insight about how your data is being used to help you identify where you need to assign policy or if you’ve already assigned policy to confirm how people are querying data within those policies.
When you hover over the heatmap (as shown in figure 4) you can see the total number of values accessed by your assigned user groups in the columns you’re governing. You also can drill down and view a more granular level of what data is accessed by your specific users, user groups, and data types.
If you check Add Data Usage Analytics when connecting your data source, ALTR will import the past 30 days of Snowflake's history to show you your organization's usage trends. From there, your query history will automatically sync daily on all columns in your connected database. See figure 3 for context.
Snowflake Data Observability Use Cases
Here are a couple of use case examples where ALTR’s automated data observability capability can benefit your business as it scales with Snowflake usage.
Use Case 1. You want to determine a typical consumption pattern and restrict access to no more data than is normal for the user’s role.
You’re an Administrator who has already created policies on your different column-level data but want to determine if you should create an additional one to your Credit Card column. You could view the data usage over the last 7 or 30 days to see what a typical consumption pattern is and then decide what to set your time or access-based threshold to.
Use Case 2. You want to determine if anything looks strange that may require action and ensure all roles accessing data have policies that cover them.
You’re a Data Scientist and want to confirm that the right user groups are accessing column-level data that you’ve created policies for. If anything looks strange (for example, certain roles are querying data on the weekends instead of your business hours) then you can determine if you need to block access or trigger an alert (anomaly) to protect your data.
Automate Snowflake Data Observability with ALTR
By operationalizing metrics through ALTR’s data observability tools , you can minimize data breaches and make informed decisions to stay in compliance with SLAs and certain regulations. The detailed data that the ALTR Dashboard and Analytics (Heatmap) provides is a must-have for an effective data security strategy. We’ve made using ALTR to view everything that’s going on within it and your analytics so convenient that you don’t have to write any SQL code. It’s a simple point-and-click in ALTR and you’re done!
See how you would get data observability by doing it yourself in Snowflake vs automatically in ALTR:
Mar 29
0
min
6 Must-Haves for Mastering Data Governance
ALTR Blog
In a study performed ahead of last week’s Gartner Data and Analytics Conference, researchers found that data governance is a top initiative that business leaders and data officers plan to focus on in 2023 and into 2024. We’re glad to see companies choosing this as a top priority. Data governance is only increasing in urgency and demand, yet we’ve seen many organizations falling behind in establishing a proper data governance practice.
Data Governance Definition
At its center, data governance is, “the process of overseeing the integrity, security, usability, and availability of data within an enterprise so that business goals and regulatory requirements can be met.”
According to the Gartner Glossary, data governance is, “the specification of decision rights and accountability framework to ensure the appropriate behavior in the valuation, creation, consumption, and control of data and analytics.” A well-run data governance practice can improve your data quality by setting the standard for how data is received, stored, and processed within your database.
A successful data governance strategy will guarantee that your data is dependable and that data users are being held accountable for their access levels. Data governance is a critical business component for all industries and will only continue to increase in traction as a standard business practice as more rules and regulations are set in motion surrounding data governance.
Building a Successful Data Governance Strategy
Building a data governance strategy ahead of implementing a data governance solution is the best way to ensure your data plan is set up in such a way that will offer success for the data you store. Your data governance strategy serves as the base for the decisions that your organization makes. Once you have that strategy in place you can better identify which data governance solution best fits your needs. A solid data governance strategy will guarantee that you are building a solid foundation for your data.
Our partner, Alation, says that a successful data governance strategy must decide four things: data availability, data consistency, data accuracy, and data security.
Data Availability: A good data governance strategy allows the correct data to be available to the correct people at the appropriate times. Your data governance solution should be structured in such a way that the people who the data is made available to can easily find and access the data. When strategizing, your team should determine what data availability you will need within a data governance solution.
Data Consistency: One key point your organization should consider when discussing your data governance strategy is the standardization of data points across your database. Determining these key data points from the beginning will help streamline the decision-making process down the road and will ensure consistent decisions are being made across your organization regarding your data.
Data Accuracy: Determining how data comes into your database, what you do with it once it’s there, and how data will exit your database will establish ground rules for the future of how your data is managed. It’s important to determine ahead of time the values, tags, and lifecycle that will be associated with data points to ensure consistency and accuracy, and guarantee that your dataset is error-free.
Data Security: Companies are responsible for protecting the sensitive data entrusted to them. Should your organization need to pass regulatory audits for any reason, a good data governance strategy and solution will ensure your data is safe, and that you have the audit logs available confirm regulatory compliance.
6 Must-Haves to Master Data Governance
A quick Google search of “Data Governance Solutions” will prove there isn’t a one size fits all approach to data governance tools. By taking the time to pre-determine your data governance strategy, you are better positioned to find a data governance solution that is flexible and scalable to fit your needs.
ALTR’s data governance solution delivers both scalability and flexibility, and we’ve seen data governance success in organizations from a multinational retailer governing PII to a regional credit union protecting PCI to a unique healthcare organization safeguarding PHI.
Our data governance features matrix outlines 18 key points that we think are critical when evaluating whether a data governance solution will suit your needs. We’ve broken down key differences that ALTR brings to the table in each of these categories:
1) Flexible Pricing
Starting price and scalable pricing are points to consider when choosing a data governance solution. While some data governance solutions charge six figures to start, ALTR is proud to offer a free solution for one database within Snowflake and scalable pricing when you decide to upgrade to ourEnterprise or Enterprise Plus plans.
2) Access Monitoring
See what data is used, by whom, when, and how frequently with ALTR’s industry-first interactive Data Usage Heatmaps and drill-down Analytics Dashboards. Access monitoring is helpful for understanding normal data usage, identifying abnormalities, and preparing your organization for an audit request.
3) Data Masking
Within ALTR, you can quickly and seamlessly apply flexible, dynamic data masking over PII like social security numbers oremail addresses to keep sensitive data private.
4) Rate Limiting
ALTR’s patented, rate-limiting data access threshold technology can send real-time alerts, and slow or stop data access on out-of-normal requests. The control is then in your hands to stop individual access when it exceeds normal limits. This is helpful for mitigating the risks associated with credentialed access and data breaches.
5) Tokenization
Tokenization is the process of replacing actual sensitive data elements with random and non-sensitive data elements (tokens) that have no exploitable value, allowing for a heightened level of data security during the data migration process. ALTR’s tokenization allows you access to secure-yet-operable data protection with our patented data tokenization-as-a-service.
6) Open-Source Ecosystem Connectors
As we worked to integrate our solution with data discovery, catalog, and ETL tools, we found opportunities for very simple data governance integrations to be created. We found open source to be an ideal distribution method as it allows companies more flexibility in building their integrated and secure data stacks.
What’s Next for Data Governance
The need for data governance is guaranteed to only increase in the coming years. IDC predicts that the global datasphere will double in size from 2022 to 2026. We predict that in the near future, companies that don’t have a data governance strategy in place now, will soon need to.
From legacy on-premises providers to new cloud-based start-ups to lateral players in the data ecosystem, it seems almost everyone offers a “data governance” solution these days. But, there’s no actual data governance without data control and protection, and DIY and manual approaches can only take you so far. To ensure they don’t fall behind, companies should be evaluating data governance solutions now to find those that meet the requirements of their data governance strategy and deliver those six “must-haves” for mastering data governance.
Mar 17
0
min
How Can Data Governance and Security Leaders Better Engage the Rest of the Organization?
ALTR Blog
Sometimes it seems like data governance and security is everyone’s and no one’s job. When that’s the case, there can be cracks in your data governance and security posture, and that can open the door to data risk. One way to overcome this is to ensure that the entire company is supportive of the initiative. But how?
As part of our Expert Panel Series, we asked some experts in the modern data ecosystem how data governance and security leaders can better engage with the rest of the organization.
Here’s what we heard….
“Data governance is not a joyful exercise."
"If there is data governance needed, there is a pain, threat or a business opportunity..."
"Implementing data governance requires changes in processes, roles and responsibilities ... and by definitions humans are reluctant to change..."
"So my key advice to data governance and security leaders is to emphasize the WHY data governance is a must, why it is needed by the organization and make sure you have a strong change management plan in place, not just tools to roll out!”
- Damien Van Steenberge, Managing Partner, Codex Consulting
“I think the best way for data governance leaders to engage the rest of the organization is to show them a world where it's easy for data consumers to access data and then show them the power and value of the data they're able to access. Show them how it makes their job easier, better, faster.”
- Chris Struttmann, Founder and CTO, ALTR
“Data governance and security initiatives can fall flat if they are perceived as top-down mandates that are out of touch with the work that’s being done. The best way to get buy-in from across the organization is to tie data governance and security initiatives with use case-based deployment. When stakeholders can see the positive impact and relevance of new technology or practices, they won’t just be more engaged – they will become champions.”
- Pat Dionne, Founder and CEO, Passerelle
“From an ETL point of view, it’s ensuring that data engineers are given the freedom to automate their pipelines which ensure they are adhering to governance policies. Too much process in this area can slow them down. They want to just run and know that it’s all going to fall into place.”
- John Bagnall, Sr Product Manager, Matillion
Watch out for the next installment of our Expert Panel Series on LinkedIn!
Mar 15
0
min
Why Open-Source Data Governance Integrations?
ALTR Blog
With our recent open-source data governance integration initiative announcement, I wanted to take this opportunity to explain in a little more detail why ALTR decided to go down the path of open-source data governance integrations. One of our principles has always been that because data governance is so critical, it has to be easy and accessible across the data ecosystem. In fact, strong governance and security are a requirement for adding many workloads to the cloud...which has led to the necessity of many of the products in our ecosystem. Other vendors in the space, though, perhaps coming from a more traditional enterprise software mindset, are focusing on their own proprietary marketplaces for connectors or charging for custom integrations to connect various data tools. While this approach may work well for short-term bookings, it doesn’t serve the long-term customer mission to get the most value out of their data.
Furthermore, that approach just doesn’t align with ALTR’s DNA. We built our SaaS platform to be accessible via the cloud, we removed the need for SQL authoring with our point and click interface, we built an incredibly powerful automation interface on top of that, and we introduced the first and only completely free data access control solution in the market. Doing the status quo just because it is the status quo is antithetical to the founding mission of ALTR.
As we worked to integrate our solution with data discovery, catalog, and ETL tools, we found opportunities for very simple open-source data governance integrations to be created. Once we built a few of them, we started to identify that open source was an ideal distribution method.
How open-source integrations help our customers:
- Unbind the buying process: Free, open-source data governance connectors allow customers the flexibility to choose the solutions they want to implement on their own schedule and timeline, rather than having to research, select, and onboard the full stack at once as part of a consolidated purchase process. Customers can move at their own pace, choose the tools they prioritize for their budget, and add data access control and security when ready.
- Flexible implementation: Open-source integrations enable customers to implement their unique use case and configure the solutions in a way that works for their infrastructure, without custom code or manual implementation, rather than being bound by the limitations of a fixed integration delivered by a partner marketplace. This also allows users with resource constraints to do more with fewer solutions, optimizing their stacks for increased efficiency.
- Enterprise-level features for free: ALTR’s enterprise-ready integrations work with data catalogs, ETLs (Extract, Transform, Load), and other data ecosystem tools data customers already use. This increases the data access control features available to customers while decreasing the number of tools they’re required to manage.
- Community development and improvement: We’ve found over the years that almost all our customers look to solve the same problems repeatedly, so like any open-source initiative, we’re enabling end users to contribute their own solutions to the ALTR GitHub library. For example, if a user wanted to send ALTR data classification information into a specific field in a data catalog, they could build that feature and submit that back to the repository for others to benefit from. Because all the customers are solving the same problems, we’ve created an environment where peers across organizations can gain from the experience of others, which makes everyone’s job easier.
ALTR’s open-source data governance integrations are available through our GitHub open source library.
With this initiative, ALTR offers non-proprietary connectors to extend the powerful features we provide in the ALTR Free forever plan into leading partner stacks, including Alation and Matillion. These open-source integrations enable seamless data governance, with access control and security spanning from database to data catalog to ELT to cloud data platforms. Complexity is removed by merely plumbing together the already in-market solutions in our ecosystem. Nothing proprietary or complex—just simple and thoughtful connectors which bring ALTR’s value and feature set into the adjacent tools of our ecosystem.
Our end goal is to facilitate interoperability and remove barriers so that customers can build an integrated cloud data stack that allows data to flow freely and securely, and ultimately allows the customer to get more value from more data more quickly and with less resources
See how our open-source data governance integration works with Alation:
See how our open-source data governance integration works with Matillion:
Hear Chris and James explain why open source data governance integrations is the best approach:
Try it yourself now with the ALTR Free plan
Mar 1
0
min
Data Lake vs. Data Warehouse: 4 Key Differences
ALTR Blog
Determining whether a data lake or a data warehouse is the best fit for your organization’s data is likely one of the first in a long line of data-driven decisions you’ll make in your data governance journey. We’ve outlined four key differences between data lakes and data warehouses and explained factors that may impact your decision.
By definition, a data lake is a place where data can be stored in a raw and unstructured format. This data is accessible whenever and by whomever - by data scientists or line of business execs. On the other hand, a data warehouse stores structured data that has been organized and processed and allows the user to view data in digestible formats based on predefined data goals. Due to their nature, there are a few key differentiators between these two data storage options.
1) Data Format
First, the format in which data can be viewed after import varies between data lakes and data warehouses.
A data warehouse requires data to be processed and formatted upon import, which requires more work on the front end, but allows for more organized and digestible data to be viewed at any point in the data’s lifecycle after defining the schema. Data typically flows into data warehouses from multiple sources, and typically on a regular and consistent cadence. Once the data is collected in the warehouse, it is sorted based on pre-determined schemas that your data team sets.
Data lakes allows you to store data in its native or raw format the entire time the data is housed within the lake. This allows for a quick and scalable import process and allows for your organization to store a lot of data in one place and access the raw form at any point. Data lakes typically are optimized to store massive amounts of data from multiple sources, allowing your data to be unstructured, semi-structured, or structured.
2) Processing
The way in which data is processed is a critical differentiator between a data lake and a data warehouse.
Data warehouses use a process called schema-on-write and data lakes use a process called schema-on-read. A schema within data governance is a collection of objects within the database, such as, tables, views, and indexes.
Schema-on-write, what is used in data warehouses, allows the data scientist to develop the schema when writing, or importing, the data, so that the database objects, including tables and indexes can be viewed in a concise way once imported. This may mean more work on the front end writing SQL code and determining the objectives of your data warehouse, but will allow for a more digestible view of your data once imported.
On the other hand, schema-on-read allows execs to forego developing the schema when importing the data into the data lake but will require you to develop the schema when accessing the data later down the road. Schema-on-read is what allows your data to be stored in unstructured, semi-structured, or structured formats within your data lake.
3) Flexibility
The benefit of schema-on-read is allowing the schema to be created on a case-by-case basis to benefit the data set. Many who opt to store their data in a data lake prefer the flexibility that schema-on-read allows for each unique data set.
Alternatively, schema-on-write interprets all imported data equally and does not allow for variance once imported. The benefit of flexibility in a data warehouses is the ability to immediately see the impact of your data within the warehouse after import – you’ve already done the front end work of determining the schema and your data will be immediately accessible and readable for you.
4) Users
Finally, accessibility and user control may be the deciding factor for how and where your company stores data.
A data lake is more accessible by day-to-day business execs and makes it easy to add new raw data to your lake. A data lake is traditionally less expensive due to the nature of the format, and because you likely won’t need additional manpower to import and maintain your data within the lake. The nature of a data lake is such that data can regularly be added in its original format and the end outcome of the data can be determined down the road, at any point in the data’s lifecycle.
A data warehouse likely will only be accessible and able to be updated by data engineers within your organization. It is more complicated to update and may be more costly because of the manpower required to produce changes. When setting up your data warehouse, your data team will likely need context of what your data needs to do in order to correctly write the SQL code that will make your warehouse successful.
It's important to note that you can have a data warehouse without a data lake, but a data lake is not a direct replacement for a data warehouse and is often used to complement a data warehouse. Many companies who use a data lake will also have a data warehouse.
Regardless of where you store your data, you’ll need to set up access rules to govern and protect it. Implementing a cloud data security solution has never been easier.
Feb 22
0
min
Snowflake Rate Limiting: DIY vs ALTR
ALTR Blog
“To rate limit with data usage thresholds or not to rate limit with data usage thresholds? That is the question.”
Even though this twist on the infamous line “To be, or not to be…” in William Shakespeare’s Hamlet is playful, protecting sensitive and regulated data from credentialed threats is very serious. We might all trust our data users, but even the most reliable employee’s credentials can be lost or stolen. The best approach is to assume that all credentials are compromised all the time.
So the question is not if you should put a limit on the amount of sensitive data that credentialed users can access, but how.
In this blog, I’ll explain what rate limiting is, how you can apply rate limiting in Snowflake, and how you’ll save time by automating data thresholds through ALTR. To reiterate how you will benefit from using ALTR to limit access and risk to data from credentialed access threats, this blog will also include a couple of use cases and a demonstration video.
What is Rate Limiting?
In a nutshell, rate limiting enables you to set a data threshold regarding specific user groups (roles) who can obtain sensitive column-level data based on an ‘access-based’ amount. For example, you might want to limit your company’s Comptroller to only query 1000 values per hour.
Another type of data threshold you could set is one that’s ‘time-based’. For example, if you want to limit access to your Snowflake data so that it can only be queried between 8-5 pm CST Monday-Friday because those are your business hours, you could automate this through ALTR.
By setting data limits and rate limits, credentialed users who query data outside of the data thresholds you’ve configured will cause an anomaly to be triggered. The anomaly alert will give you a heads-up so that you can investigate and take appropriate action.
Why are Rate Limits Important?
Setting rate limits is a must to control how much data a credentialed user can access. Just because they are approved to access some amount of data, doesn’t mean they should be able to see all of it. Let’s think of a credentialed user as a ‘house guest’ (metaphorically speaking). If you invite someone to stay for a few nights in your home during the week and everyone in your family turns in for the night by bed 11 pm, then does that mean you should give your houseguest free rein to roam through every room at 2 am after the household shuts down? And to circle back to data security, if credentials fall into the wrong hands or a user becomes disgruntled, you want to ensure that they cannot exfiltrate all the data but instead only a limited amount.
What to Consider When Establishing Rate Limits
Keep the following things in mind to help you think through the best approach for setting data thresholds as an extra layer of protection.
- Gain a clear understanding of which columns contain sensitive data by using Data Classification (for context, see the Snowflake Data Classification: DIY vs ALTR Blog).
- Gain a clear understanding of the amount and type of sensitive data different roles consume by using ALTR’s data usage heatmap
This insight should help your data governance team establish data rate limits that (if exceeded), should generate a notification or block their access.
How Snowflake Rate Limiting Works if You DIY
It doesn’t really. Here's why: data is retrieved from databases like Snowflake using the SQL language.
When you issue a query, the database interprets the query and then returns all the data requested at one time. This is the way SQL is designed.
Snowflake has role-based access controls built in but these controls are still designed to provide all of the requested data at once so a Snowflake user gets either all of the requested data or none of it. There's no in between. The concept of automatically stopping the results of a query midstream simply does not exist natively. This limitation applies to most if not all sequel databases. It's not something unique to Snowflake.
How Snowflake Rate Limiting Works in ALTR
You can automate rate limiting by using ALTR in four simple steps. Because data thresholds extend the capabilities of Column Access Policies (i.e., Lock), you must create a Column Access Policy first before you can begin using data thresholds.
1. If you haven’t already done so, connect your Snowflake database to ALTR from the Data Sources page and check the Tag Data by Classification option.
This process will scan your data and tag columns that contain sensitive data with the type of data they contain.
2. Choose and connect the columns you want to monitor from the Data Management page.
You can add columns by clicking the Connect Column button and choosing the desired column in the drop-down menus. See figure 2.
3. Next, add a lock to group together sensitive columns that you want to put data limits on from the ‘Locks’ page.
In this example we are creating this lock named “Threshold Lock” to enforce a policy that limits access to the ID column for Snowflake Account Admins and System Administrators to no more than 15 records per minute. See figure 3
4. Create a data threshold that enforces the desired data limit policy.
Here we are creating a threshold that applies to ACCOUNT admins and SYSADMINS that limits access to the ID column in the customer table to no more than 10 records per minute. See figure 4.
You can specify the action that should occur when a data threshold is triggered.
- Generate Anomaly: This generates a non-blocking notification in ALTR.
- Block: This blocks all access to columns connected to ALTR to the user who triggered the threshold, replacing them with NULL.
You can also set Rules: These define what triggers the data threshold.
- Time-based rules: These will trigger a threshold when the indicated data is queried at a particular time.
- Access-based rules: These will trigger when a user queries a particular amount of data in a short time
Snowflake Limit Use Cases
The Snowflake limit use cases below are examples of realistic scenarios to reiterate why this extra layer of security is a must for your business. Rate limiting can minimize data breaches and prevent hefty fines and lawsuits that will affect your company’s bottom line and reputation.
Use Case 1.
Your accounting firm has certain users within a role or multiple roles who only have a legitimate reason to access sensitive data such as personal mobile phone numbers or email addresses a certain number of times during a specific time period. For example, maybe they should only need to query 1000 records per minute/ per hour/ per day.
If a user is querying that data outside of the threshold, then it will generate an anomaly and, depending on how you’ve configured the threshold, also block their access until it’s resolved.
Use Case 2.
The business hours for your bank are Monday through Friday from 8-5 pm and Saturday from 9-1pm ET. There are certain users within a role or multiple roles that you’ve identified who have a legitimate reason to access sensitive data such as your customer’s social security numbers only within this timeframe.
Rate Limit Violations
- If the Threshold is only configured to generate an Anomaly, then the user who triggered the data threshold will be able to continue querying data in Snowflake.
- If the Threshold is configured to block access, then the user who triggered the data threshold will no longer be able to query sensitive data in Snowflake. Any query they run on columns that are connected to ALTR will result in NULL values. This behavior will continue until an ALTR Admin resolves the anomaly in the ‘Anomalies’ page.
- In addition, when there is an anomaly or block, ALTR can publish alerts you can receive through your Security Information and Event Management (SIEM) or Security Orchestration, Automation and Response (SOAR) tool for near-real-time notifications.
Automate Rate Limiting with ALTR
In today’s world where you must protect your company’s sensitive data from being hacked by people on the outside and leaked from staffers working on the inside, a well-thought-out data governance strategy is mandatory. By having to constantly remain vigilant, trying to safeguard your data might almost seem like a challenging chess match. However, ALTR can make this ‘game of strategy’ easier to win by automating rate limits for you.
Do you really have the time to write SQL code for each data threshold you want to set as your business scales? By using ALTR, it’s a few simple point-and-click steps and you’re done!
See how you can set rate limits in ALTR:
Feb 8
0
min
Data Modernization: Challenges, Best Practices & the Importance of Data Governance
ALTR Blog
Organizations don’t have the time or resources to waste on technologies that don’t work or are difficult to assemble. Creating a future-proof data stack allows organizations to avoid adoption fatigue, build a data-centric culture, keep data secure and make the most of technology investments. We interviewed Pat Dionne, CEO of Passerelle, to find out the prerequisites for a successful data modernization strategy and why data governance plays a critical role in your data ecosystem.
What are the biggest challenges customers face when building a modern data architecture?
It isn’t hard to build a modern data stack – there are a dizzying variety of tools, and each comes with a compelling value proposition. The biggest frustration customers have after they have made the investment and have started trying to get an ROI from their tools. While ingestion can be simple, assembling data into reusable and manageable assets is much more complex. Data modeling and data quality directly impact an organization’s ability to maximize value and agility and are critical for finding a return on the technology investment. Unfortunately, the latter is often forgotten in the decision process.
What components are vital to successful data modernization projects?
When it comes to data modernization, it is critical to have a collaborative approach to cataloging and securing data across an organization. Collaboration builds consensus on data classification terms and rules, creating a universal definition of data asset ownership and a clear understanding of what is required to access data. The more complicated the access scenarios, the more critical it is to have a transparent, cohesive implementation strategy. Similarly, it is essential to invest in tools that support collaboration. For example, we like the simplicity and elegance of ALTR’s solution enabling data governance and security teams.
What role do data governance and data security play in modern data architecture?
Data governance moves data security from a controlling function to an enabling function, while data security protects data from unauthorized access and use. Data governance cannot exist without robust data security; in turn, data security should not inhibit business agility and creativity. Managing the interplay between data governance and security requires understanding how data is used and by whom and requires the proper tooling to enable businesses while providing the appropriate level of control and observability. ALTR simplifies the process by offering clear access controls and immediate visibility into data security protocols.
How do you foster a culture of data governance?
For data governance programs to succeed, IT and business stakeholders need to see the value in implementation and adoption. Tying data governance programs to business use is the ultimate unifier - it requires bringing together data stewards, business-line decision-makers and data engineers to a collective understanding of their roles and responsibilities. We refer to this as “Data as a Team Sport.” We are firm believers in use-case-based development – it is easier to get people on board when you have proven results and vocal champions.
What advice would you give to a company starting its data modernization journey?
Introducing practical data governance at the onset of data modernization is easier. Most of the time, organizations will introduce tools and proficiencies throughout a data modernization initiative - the proper data governance practices and tools will apply to every step of that modernization journey and scale with use. In building terms, it is easier to provide structural support with a sturdy foundation than to rely on scaffolding once the walls start to go up.
How do you predict the data management landscape will change in the next 3-5 years?
I see three major trends in the next three to five years:
- First, we will see an increase in automation and intelligence in data management tooling, fueled by AI developments and human brilliance.
- Organizations will demand more integrated solutions to reduce technical debt and manage leaner technology stacks.
- Not only will we see increased regulatory compliance requirements, but we will also enter an era of enforcement, where the government will become more aggressive at enforcing data privacy laws.
Pat Dionne, CEO of Passerelle
Passerelle offers solutions for business growth and results, and with that, a team of experienced technical contributors and managers and the innovative technologies to create the right solution for clients. Pat is at the heart of this synergy, bringing a deep understanding of the modern technologies capable of addressing today’s complex data business challenges, as well as the proven capacity to build and empower highly effective teams.
Feb 1
0
min
5 Tips for Data Governance in 2023
ALTR Blog
Many data governance solutions claim to solve every data privacy and protection issue, but we know that no two data governance solutions are created equal. As we launch into the New Year, we’ve listed our top 5 tips for Data Governance in 2023. These tips will help you determine what you need from your data governance solution, identify a few red flags to look out for, and point out some key differentiators that may help make your decision for you.
Tip 1: Keep tabs on your organization’s sensitive data.
The first step to ensuring your data governance solution is the right fit for you, is asking the question: “Where does sensitive data exist within my organization, and is it protected?” Understanding what sensitive data you store and who has access to it are critical first steps to ensuring the data governance solution you implement will fit your needs. While only certain data requires protection by law, leaked data can cause a headache across your organization – from damaging your reputation to the loss of loyal customers. It is essential that your data be discovered and classified across your organization’s ecosystem at all times.
Tip 2: Does your Data Governance solution offer complete coverage?
Data classifiers and catalogs are valuable and are extremely necessary in context, but at the end of the day, they cannot offer you a full governance solution. For complete data governance, you must not only be able to find and classify your data, but see data consumption, utilize thresholds to detect anomalies and alert on them, respond to threats with real-time blocking, and tokenize critical data at rest. True data governance will need to address a wide spectrum of access and security issues, including Access Controls, Compliance, Automation, Scale, and Protection. ALTR simplifies these steps for you – allowing you the ease of point and click solutions to better secure and simplify your data.
Tip 3: More expensive doesn’t mean better.
Many data governance solutions cost anywhere from $100k to$250k per year just to get started! These large, legacy platforms require you to invest valuable time, resources and money to even get started. You may need an army of costly consultants and six months to implement. On the other hand, ALTR’s pricing starts at free for life. Our Free Plan isn’t a trial plan, it’s just that – Free. Our Free plan gives you the power to understand how your data is used, add controls around access, and limit your data exposure. You can see how ALTR will work in your data ecosystem without risk.
If you need more advanced governance controls, integration with your enterprise governance and security platforms, or increased data protection and dedicated support, our Enterprise and Enterprise Plus plans area vailable. ALTR’s tiered pricing means there’s no large up-front commitment—you can start for free and expand if or when your needs change. Or stay on our free plan forever.
Tip 4: The Who of Data Governance
Clearly defining roles within your organization surrounding who needs access to data and when will set you up for success when it comes to protecting sensitive data within your organization.
When you know why each person needs the data you are protecting, you can build access control policies to fit highly specific purposes. Using ALTR you can create policies that limit access based on which data is being requested, who is requesting it, the access rate, time of day, day of week, and IP address. ALTR’s cloud-based policy engine and management console allow you to control data consumption across multiple cloud and on-premises applications from one central location.
Tip 5: Does your data governance solution allow you to scale?
Scalability may be the one thing that makes or breaks your data governance solution in 2023. As regulations and laws surrounding data privacy become more common, the more the data you own will need to be protected. The more data you need protected, the more time your data team is needing to allocate to processes that could easily be automated within ALTR. Governance solutions should easily implement and manage access for thousands of users to match. Scaling policy thresholds as needed allows you to optimize collaboration while stopping data theft or accidental exposure.
Bonus Tip: Start for Free
We anticipate that 2023 will be a critical year for companies being held accountable for the sensitive data they own. ALTR makes getting ahead of the curve simple, easy, and achievable. With ALTR’s free data governance and security integration for Snowflake, you can automatically discover, classify, and tag sensitive data with a checkbox. Add controls like data masking from a drop-down menu. Get going in less than an hour. No SnowSQL is required.
Jan 25
0
min
PII Security - Your Complete Guide to Protecting Personally Identifiable Data
ALTR Blog
What is PII Security?
PII security has become something just about everyone has had to think about in the last few years with the increase in personal data breaches and the passage of the GDPR regulations in Europe. But that doesn’t mean it’s well understood. What do we mean when we talk about PII data anyway? Personally Identifiable Information or PII data generally refers to information that is related to or key to identifying a person. There are broader terms such as “personal data” or “personal information,” but “PII” has become the standard acronym used to refer to private or sensitive information that can identify a specific individual. The US NIST framework defines Personally Identifiable Information as any representation of information that permits the identity of an individual to whom the information applies to be reasonably inferred by either direct or indirect means.
While the abbreviation “PII” is commonly used in the United States, the phrase it abbreviates is not always the same – there are common variants based on personal or personally, and identifiable or identifying. The meaning of the phrase "PII data" ends up varying depending on the jurisdiction and the purpose for which the term is being used. For example, where the General Data Protection Regulation (GDPR) is the primary law regulating PII data, the term "personal data" is significantly broader. Regardless of the definition used, the focus on PII security is also growing quickly.
PII security consists of ensuring that only approved users have access to this most sensitive of personal data. In some cases this is required by regulation, but in the US, without a federal regulation like GDPR, it's more often a requirement to maintain customer trust. In this blog, we'll outline PII data examples, the differences between PII, PHI and PCI and explain the steps you should take to identify PII and ensure it's secured.
PII Data Examples
The first step to PII security is understanding what is considered PII data. As mentioned above, it’s more complicated than it may first appear. Not all private information is PII and not all PII data is private information. In fact, much of the information considered PII data and covered by regulation is actually publicly available information, such as an individual’s name or phone number. However, some of the information, especially when combined and in the hands of bad actors, can lead to negative consequences for individuals. Here are some PII examples:
- Names: full name, maiden name, mother’s maiden name, or alias
- Individual identification numbers: social security number (SSN), patient identification number passport number, driver’s license number, taxpayer identification number, financial account number, or credit card number
- Personal address: street address, or email address
- Personal phone numbers
- Personal characteristics: photographic images (particularly of a face or other identifying physical characteristics), fingerprints, handwriting
- Biometric data: retina scans, voice signatures, facial geometry
- Information identifying personal property: VIN or title number
- Technical Asset information: Internet Protocol (IP) or Media Access Control (MAC) addresses that consistently link to a particular person’s technology
What Data Does Not Require PII Security?
PII security becomes easier if you understand what is not PII data. The examples below are not considered PII data alone as each could apply to multiple people. However, when combined with one of the above examples, the following could be used to identify a specific person:
- Date of birth
- Place of birth
- Business telephone number
- Business mailing or email address
- Race
- Religion
- Geographical indicators
- Employment information
- Medical information
- Education information
- Financial information
PII vs PHI vs PCI Data
PII data has much in common and some overlap with other forms of sensitive or regulated data such as PHI and PCI, but it is not the same. Confusion often arises around whether PII means information that is identifiable (can be associated with a person) or identifying (associated uniquely with a person, so that the PII actually identifies them). In narrow data privacy rules, such as the Health Insurance Portability and Accountability Act (HIPAA), PII items have been specifically defined. In broader data protection regulations such as the GDPR, personal data is defined in a non-prescriptive principles-based way. Information that might not count as PII under HIPAA could be considered personal data per GDPR.
PHI data is personal health information as defined by the Health Insurance Portability and Accountability Act of 1996. HIPAA provides federal protections for personal health information held by covered entities and gives patients an array of rights with respect to that information. At the same time, HIPAA permits the disclosure of personal health information needed for patient care and other important purposes. This federal law required the creation of national standards to protect sensitive patient health information from being disclosed without the patient’s consent or knowledge. The US Department of Health and Human Services (HHS) issued the HIPAA Privacy Rule to effect the requirements of HIPAA. The HIPAA Security Rule protects a subgroup of information covered by the Privacy Rule. In addition to very clear health information, there is some overlap as when PII data like name, date of birth, and address are tied to personal health information, it is considered PHI as well.
PCI data stands for “payment card industry” and is defined by a consortium of financial institutions comprising the Payment Card Industry. The definition comes from the rules for protecting data in the PCI-DSS or payment card industry data security standard. The PCI Security Standards Council (SSC) defines “cardholder data” as the full Primary Account Number (PAN) or the full PAN along with any of the following identifiers: cardholder name, expiration date or service code. The rules were implemented to create an additional level of protection for card issuers by ensuring that merchants meet minimum levels of security when they store, process, and transmit cardholder data.
In the past PCI data might have been considered the most valuable and most at risk because it was related to financial data and could be used to directly access money. However, as many of us have unfortunately learned due to rampant credit card fraud over the last few years, credit card numbers can be easily changed. It’s not nearly as easy to move, change your social security number, or even your name. Those who have dealt with identity theft can understand how devastating it can be when unknown loans or other fraud show up on your credit report. And health information is simply unchangeable as its part of a person’s permanent “life record.” That puts PII data and PHI data in the lead in the race for data value and data risk. PII data might be considered more at risk due to its proliferation so PII security should always be a priority.
PII Security and the Internet
Before 1994, very little of our PII data was easily accessible so PII security wasn't as critical. If you wanted someone’s phone number, you had to know their name and have a hefty copy of what we called the “white pages” (a phone book) in order to look them up. Maybe a bank or telephone company had access to thousands of phone numbers, but not the average person. All of that changed with the advent of the Internet. The concept of PII data has become prevalent as information technology and the Internet have made it easier to collect PII. Every online order requires a name and email, not to mention physical address or phone number. This has led to a profitable market in collecting and reselling PII. PII can also be exploited by criminals in stalking or identity theft, or to aid in the planning of criminal acts. In reaction to these threats, many website privacy policies now specifically inform users on the gathering of PII, and lawmakers have enacted a series of regulations to limit the distribution and accessibility of PII making PII security a priority for consumers and companies.
PII Security Regulations
The era of stringent PII data privacy regulations that required PII security really kicked off with the implementation of the European Union’s General Data Protection Regulation (GDPR) in May 2018. This regulation requires organizations to safeguard personal data and uphold the privacy rights of anyone in EU territory. The regulation includes seven principles of data protection that are required and eight privacy rights that must be enabled. It also gives member state-level data protection authorities the power to enforce GDPR with sanctions and fines. The GDPR replaced a country-by-country patchwork of data protection laws and unified the EU under a single data protection regime. The regulation doesn’t apply to just European companies, however. Any company holding personal data of European citizens must comply.
The US is further behind the PII privacy regulation game. There is as yet no federal or national privacy regulation that applies across the country. The US is still in the patchwork era with some states like California, Utah, Colorado, Connecticut and Virginia passing state-level regulations. Five more states have introduced regulations. In 2022, a new bipartisan regulation called the American Data Privacy and Protection Act was introduced in the US House of Representatives. It follows the direction of GDPR and would apply to data controllers and accessors. It is effectively a consumer “Bill of Rights” around PII data privacy. The legislation currently sits in the House of Representatives for approval.
4 Steps to Complete PII Security
These privacy regulations have specific rules around PII security – what data should be protected and how. But in order to comply fully and reduce risk of censure, fees or fines, companies will need to take 4 key steps:
- Data classification: The first step to PII security is to identify sensitive information stored in your company’s databases. This can be done manually by reviewing all the databases and tagging columns or rows that contain PII. Some database solutions allow you to write SQL processes to do this also. However, it’s much faster and less error-prone to utilize an automated solution to find and tag social security numbers, date of birth or other key information wherever it’s located.
- Data access controls: Once PII data is identified controls that allow only approved individuals to access sensitive data should be applied. These controls can include data masking (changing characters to ***) and row or column-level access policies. A common additional requirement is auditable documentation of who has accessed what data and when.
- Data rate limiting: Because it’s best to assume any credentials could be compromised at any time, it’s best to limit the amount of damage even authorized access can do. Instead of allowing millions of lines of data to be downloaded, apply controls that limit the amount of data by role, by location, by time access to reduce the risk of a massive breach.
- Data tokenization: Finally, the most sensitive data should be secured via a data tokenization solution that ensures even if “data” is accessed by a bad actor, they will only get their hands on tokens that are useless to them. The real data is stored in an encrypted token vault.
Conclusion
The problem of PII security is only on the upswing. As companies extract more insight and value from personal data on consumers, product users and customers, they’ll continue to gather, hold, share and utilize data. In fact, companies are not just collecting data for their own use, but also to monetize it by selling the insights on their own customers to others to glean information from. While data collection and storage are increasing, laws regulating how this data can be stored and used are also increasing. Companies can stay ahead of the curve with processes and solutions to help scale PII security with the growth of PII data.
Jan 12
0
min
ALTR vs Other Snowflake Data Governance Solutions
ALTR Blog
As a Snowflake Premier Partner and founding member of the Snowflake Data Governance Accelerated Program, we get a lot of questions about how ALTR is different from other Snowflake data governance solutions, including Snowflake!
The short answer is that we automate the existing native Snowflake governance features for data masking policies and role-based and row-level access policies. Why is that important? Why Is that valuable? When you automate these Snowflake features, it allows Snowflake users to address some key challenges.
Bridging the Snowflake Skills Gap
First, you get to the opportunity to address a skills gap. Maybe some of your team members are not as trained up on SnowSQL yet, or they haven't taken all of the Snowflake certification training, especially if you're early in your Snowflake journey. Maybe you and your team don't have time to learn about data masking policies or some of the nuances that come with Snowflake row level policies, and so ALTR can help you automate that in a very simple and easy to use manner. ALTR’s fast SaaS implementation, access via Snowflake Partner Connect, and no-code policy management take the burden off your team and can even allow other data owners throughout the organization to handle data access controls and enforcement.
Snowflake Data Governance at Scale
The second thing you get to address is deploying these capabilities at scale. We've seen a number of customer projects where implementing these data controls at scale is taking up entire teams of people when it just shouldn’t need that many resources. If you have one centralized tool like ALTR to manage who has access to what data and how much, you take a lot of that kind of scale overhead, and that friction of growing with Snowflake, out of the equation. This comes into play if you set up your Snowflake governance policies the way you want for one database or one account. If you're part of a large organization, you may want to apply that across multiple databases. We recently encountered a company that had nine accounts across all three different cloud providers that Snowflake offers. How do you make that portable across all of those accounts, and all of those deployments? ALTR can make this easy.
See how ALTR’s features compare to other Snowflake Data Governance solutions
A Single Source of Snowflake Data Access Truth
There’s a lot of confusion out there in the market around what “data governance” is exactly. When you’re thinking of other “data governance” solutions for Snowflake like a Collibra or Alation or Immuta or others who are in this space, keep in mind that there are other parts of data governance and many handle some processes like data classification or cataloging, but ALTR’s sole focus is on delivering that single source of truth for data access. You can see how users are using data, control which users have access to which data, reduce risk by limiting the rate of data access, and put powerful tokenization data security over the most sensitive data, all with ALTR.
Low Cost, Fast Implementation, Enterprise Quality
This kind of Snowflake data governance is a really hard problem for companies to solve, even when they had full teams and full budgets to attack it. But moving into 2023, we’re seeing companies lose headcount and resources and getting very picky about selecting the specific tools to help accomplish specific goals. One of the other major differences between ALTR and some of the other Snowflake data governance solutions is that we’re waging a war on six figure price points for solutions and six months of professional services to implement. If you are being offered or sold a tool that is very expensive and is going to take you a long time to roll out and learn to use, those vendors don't have your best interest in mind. With our pure SaaS platform (which other solutions are notw) we’re making data governance easy to buy, easy to implement, and easy to use. Bottom line: ATLR provides the functionality companies need to govern data at a price point other solutions just can’t touch.
See how ALTR’s Snowflake Data Governance solution can work for you. Try our Free Plan today.
Jan 10
0
min
The 2023 Data Governance Landscape: Setting Data Privacy Goals
ALTR Blog
It’s no secret that Data Governance, PII, and Data Security were among the most talked about topics in Q4 of 2022. Security breaches were rampant and technology teams continued to feel stretched thin, while sensitive data was sometimes unwittingly left unprotected and at risk for exposure. We’ve compiled some key guidance from our partners and industry leaders to help you implement strong data governance in 2023 – from simplifying the definition of data governance, to emphasizing the importance of scalability and automation within your data governance plan.
Alation: Key Insights: Forrester’s New Data Governance Solutions Landscape
This blog, written by John Wills, Field CTO at Alation, takes a look at data governance from a wholistic perspective – explaining the big picture of creating a data governance plan for your organization, while recognizing certain aspects that may vary between companies. Wills teaches us that your company’s data governance solution should exist cross-departmentally and shares that people often miss the mark when their data privacy exists in a vacuum.
Tableau: Keep Your Data Private and Secure
Sheng Zhou, Product Manager at Tableau, writes about the importance of data privacy and protection – specifically from the perspective of protecting and securing PHI to meet HIPPA laws. Zhou shares that, regardless of the type of data you’re protecting, it is a critical business component to be vigilant about securing your sensitive data. Zhou mentions that data governance and data privacy are so important, these processes have to be part of normal and everyday business operations.
BigID: How Strong Data Governance With BigID Drives Down Privacy Compliance Costs
Peggy Tsai, Chief Data Officer at BigID, discusses how having strong data governance can help drive down your privacy compliance costs, and, at the end of the day, can save your company a lot of money. Tsai begins this blog by explaining certain laws around data privacy (General Data Protection Regulation (GDPR), California Consumer Privacy Act (CCPA) and what can happen to your company when the GDPR requests a Data Subject Access Request (DSAR). Tsai provides an in-depth analysis outlining the importance of strong data governance, so your company can avoid errors, legal fines, and the headache when you receive a DSAR.
Alation: Becoming a Data Driven Organization in 4 Steps
In this blog post, Steve Neat, GM EMEA at Alation walks through tangible steps you can take to ensure your organization is data driven. Neat explains that becoming a data driven organization isn’t just about adding new technologies to your tech stack, it truly requires full investment from all stakeholders. Neat shares how Data Governance plays a huge role in being a data driven organization, by creating processes surrounding where your data is stored and who can access it. Agility is a key factor in data governance – ensuring your organization is in the driver’s seat of protecting your data.
As uncertainty continues to rise in numerous business sectors across the globe, we’re seeing people recognize the need for strong data governance as well. We’re here to help you ensure your organization is ahead of implementing and streamlining a data governance plan. ALTR’s free plan is the perfect place to start - you can automatically discover, classify, and tag sensitive data with a checkbox. It’s easy to get going in less than an hour with no SnowSQL required.
Jan 18
0
min
Snowflake Data Classification: DIY vs ALTR
ALTR Blog
Have you ever walked into a store and noticed that while some items are displayed freely on shelves, some are visible, yet locked behind glass? We can guess that those items are higher quality, higher value, higher risk. It's pretty clear when inventory comes into the store which items fit this category. It can be less clear when data comes into your database. That's where data classification can help.
In this blog post, we’ll explain what data classification is, why data classification is an important step in your data security strategy, and how you would classify data yourself with SQL versus doing it automatically with ALTR.
What is Data Classification?
Data classification is the process of identifying the type of information contained in each column of your database and categorizing the data. This is an important first step to securing your data. Once you know the type of information contained in each column, you will be able to compare the type to a list of information types that your business considers sensitive. This in turn will make it possible to protect that data with appropriate data access policies. Before you create a column-level policy, you should classify it. By implementing data classification, you can minimize the risk of a sensitive data compromise.
Data Classification Factors
To protect your company’s sensitive data, you must first know what type of data you have. Therefore, data classification is a must to avoid having your data hacked by cybercriminals or leaked by individuals inside your business. To determine how to apply data classification consider the following factors:
- Timing: In order to enforce a data policy, you must know which columns contain sensitive data. So, you need to classify your data before implementing data access policies. You should also reclassify any time you add new sources of data.
- Methods: The method you use should involve sampling actual data values found in the data. Avoid relying completely on the name of the column.
- Automation: Classification can be tedious when done manually. A typical database will have hundreds if not thousands of tables, and each table can have hundreds of columns giving rise to missed columns and errors in copy/pasting results.
- What Data is Sensitive: Have a list of the information types that are sensitive in your situation. For example, what data security regulations apply to your company, what does your internal data security team require, and so on.
These factors will help to ensure that your data classification efforts are efficient and thorough.
How Snowflake Data Classification Works DIY
Read on to learn what’s required to classify data in Snowflake yourself with SQL via three different methods: good, better and best.
Who Can Do It: A software developer who can manually write SQL code AND categorize and manage data well
Downsides to manually classifying data in Snowflake:
- Time-consuming
- Higher risk of missing data that needs to be classified
- You’ll have to manually store your results in a database, making it difficult for non-technical users to analyze the results
1) “Good” Method: Column Name
This is a way to identify what type of data is in a column by looking at the column name. You can run a query that uses a conditional expression for each data type against the information schema inside of Snowflake.
The query result will display every column of data that matches your condition in your Snowflake account. The downsides are that you must run the query for every data type you want to identify, and you might miss columns that need to be identified if they weren’t named clearly. For example, if you’re trying to identify all columns of ‘email’ but it’s abbreviated as ‘eml,’ then it won’t be returned in your query.
2) “Better” Method: Sample Rows of Data
This is better than the column name method because it will grab a sample of rows and then you can clearly see the content of each column. However, it’s still not the ‘best’ approach. Because the query will display multiple rows and column values for you to view, this can be time-consuming and overwhelming.
3) “Best” Method: Extract semantic categories
This data categorization method is the best one because it does the sampling for you. You can run extracted categories from a table and a JSON object with scored classification results will be generated in the query result. The caveats are that you must run this across each table in your database, and you must manually store and present results to use them to create access policies
How Snowflake Data Classification Works in ALTR
While you could choose one of the ‘good, better, and best’ approaches above to classify your data manually in Snowflake, using ALTR to automate data classification is the ‘supreme’ approach.
Who can do it: Anyone can do it and you don’t have to write SQL or log in to Snowflake.
Downsides to classifying data in ALTR: None
There are only four steps to ALTR Snowflake data classification.
- Simply choose the database that you’d like to classify (shown in figure 8).
- Check the box beside Tag Data by Classification.
- Choose from the available tagging methods.
- Click on Update. This starts the process of classifying all the tables in that database. When the job is complete, you’ll receive an email to let you know it’s done.
NOTE: An object tag is metadata (such as a keyword or term) that is assigned to a piece of information as a description of it for easier searchability. ALTR can use object tags assigned to columns in Snowflake to classify data or, if those or not available, ALTR can assign tags to columns using Google DLP classification.
The classified data results will be integrated into a Data Classification Report.
Snowflake Data Classification Use Cases
Here are a couple of use case examples where ALTR’s automated data classification capability can benefit your business as it scales with Snowflake usage.
Use Case 1. Protected health information
Your data team is integrating an employee dataset from a recently acquired hospital into your main data warehouse in Snowflake. You need to determine which database columns have healthcare-related data in them (e.g., social security numbers, diagnostic codes, etc.,). The original developers of the dataset are no longer available, so you use ALTR to classify the dataset to identify those sensitive columns.
Use Case 2. Financial records information from sales
You are a healthcare product manufacturer and you have just signed a new online reseller for your products. The resellers sales data will be dumped into your Snowflake database every week and will contain sales transaction data including addresses, phone numbers, and payment information; however, you don't know where this data is located in the database.
What You Could be Doing: Automating Snowflake Data Classification with ALTR
In today’s world, implementing data classification as part of your company’s security strategy is critical. You can’t afford to put your company at risk of fines and lawsuits due to data breaches that could’ve been prevented. Do you or your security team have hours in a day to spend manually writing SQL code each time that you add data to your databases? Do you want to spend hours trying to figure out why a query didn’t generate any results due to unclear column names or other issues? We’ve made using ALTR such a convenience that you don’t even have to write any SQL code or log into Snowflake! It’s a simple point-and-click four-step procedure in ALTR and you’re done!
Watch the ‘how-to’ comparison video below to see what it looks like to manually classify Snowflake data versus automating it with ALTR.
Ready to give it a try? Start with our Free Plan today
Get the latest from ALTR
Subscribe below to stay up to date with our team, upcoming events, new feature releases, and more.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.