Thinkers360

Data Governance — A QuickStart With Azure Purview

Apr



When we talk about assets on the balance sheet, Data deserves its row” — Satya Nadella — Microsoft CEO.

As an organization, you have a big question in front of you “How to handle user’s data?”, it can be either used to support your business, or it can be used to give your end-users a better experience.

With enough data and a roadmap to use that data effectively, you can accelerate your company’s growth. Using Data effectively is incomplete without the term data governance. Here’s every “Why? How? Where?” you need to know about Data governance and Azure Purview.

Why Data Governance?

Data is the new currency of the current digital age. But data within organizations is growing at exponential rates. 90% of data today was created in just the last two years. And by 2025, 80% of data will be unstructured data. This influx of data has increased the organization and challenges many folds.

To get real business value from Data, the organization needs to know:

  1. What Data exists within the organization?
  2. Who owns the Data? Who can access the data?
  3. For what purposes can they use the Data responsibly and ethically?
  4. Data lineage (traceability of data flow and its usage in solutions)
  5. Duplicate data
  6. Quality of data and common taxonomy
  7. Security and compliance for the data captured
  8. Where and How the Data is stored or archived (and overall lifespan of data)

Lack of understanding of any of the above can create operational inefficiencies, confusion related to Data and information being distributed internally and externally, and poor business decisions based on flawed or misunderstood data. Well, that’s only a part of the problem set as regulators are cracking down on companies for any compliance data privacy and data sovereignty (and I won’t be surprised if soon we start seeing regulations around the ethical use of data).

What is Data Governance?

According to Gartner, “Data governance is the specification of decision rights and an accountability framework to ensure the appropriate behavior in the valuation, creation, consumption, and control of data and analytics.”

Data governance helps ensure the data is usable, accessible, and protected. It also helps in more informed data analytics because an organization can come to a well-informed conclusion. Data governance also improves the consistency of the data, removes redundancies, and helps make sense of garbage data, which can save an organization from a big decision-making problem. 

Data governance also allows organizations with:

  • Data consistency.
  • Reduced data management costs.
  • Increased data access for everyone involved for better data-driven decision-making. 
  • Improved employee experience (thus higher engagement level and Productivity).
  • Improved customer experience by enabling insights into customer behavior/ patterns faster and facilitate 360 views to drive personalized experiences at scale.
  • Overall brand value.

What’s Microsoft Azure Purview?

Microsoft Azure Purview is a fully managed, unified data governance service that helps you manage and govern your on-premises, multi-cloud, and SaaS data. Purview creates a holistic, up-to-date map of your data landscape with automated data discovery, sensitive data classification, and end-to-end data lineage. Purview empowers data consumers to find valuable, trustworthy data.

It’s built over Apache Atlas, an open-source project for metadata management and governance for data assets. Azure purview also has a data share mechanism that securely shares data with external business partners without setting up extra FTP nodes or creating redundant large datasets. Azure Purview does not move or store customer data out of the region in which it is deployed.

Purview is Available for Public Preview

There is currently no licensing cost associated with Purview; you pay for what you use. The pay-per-use model offered by Microsoft as part of Public Preview is exciting for Microsoft customers looking to move quickly without having to create a business case to secure an additional budget. Azure Purview reduces costs on multiple fronts, including cutting down on manual and custom efforts to discover and classify data and eliminating hidden and explicit costs of maintaining homegrown systems and Excel-based solutions.

Data Sources Supported by Azure Purview

It supports the following type of data sources at the time of writing:

  1. SQL Server on-premises
  2. Azure Data Lake Storage Gen1
  3. Azure Data Lake Storage Gen2
  4. Azure Blob Storage
  5. Azure Data Explorer
  6. Azure SQL DB
  7. Azure SQL DB Managed Instance
  8. Azure Synapse Analytics (formerly SQL DW)
  9. Azure Cosmos DB
  10. Power BI
  11. Teradata
  12. ERP sources like SAP S/4 HANA and SAP ECC.
  13. Oracle DB as a data source
  14. Amazon S3 �� Azure Purview customers can now scan and classify data residing in Amazon AWS S3 with the help of automated scanning, AI-powered built-in and custom classifiers, and Microsoft Information Protection sensitivity labels.

Critical Capabilities of Azure Purview

Azure Purview consists of below main features:

1. Azure Purview Data Map

Azure Purview Data Map provides the foundation for data discovery and effective data governance. It’s a cloud-native PaaS service that captures metadata about enterprise data present in analytics and operation systems on-premises and cloud. Purview Data Map is automatically kept up to date with a built-in automated scanning and classification system. Business users can configure and use the Purview Data Map through an intuitive UI, and developers can programmatically interact with the Data Map using open-source Apache Atlas 2.0 APIs.

Purview Data Map powers the Purview Data Catalog and Purview Data insights as unified experiences within the Purview Studio.

Data Map extracts metadata, lineage, and classifications from existing data stores. It enables you to enrich your understanding with the help of classifiers at cloud scale classify data using 100+ built-in classifiers and your custom classifiers. With Purview Data Map, organizations can centrally manage, publish and inventory metadata at cloud scale and further extend using Atlas Apache open APIs.

Label-sensitive data feature is supported consistently across the database servers, Azure, Microsoft 365, and Power BI. Along with that lets you easily integrate all your data systems using Apache Atlas Open-source APIs.

2. Purview Data catalog

With Data Catalog, Purview enables rich data discovery with the luxury of searching business & technical terms & understanding data by browsing associated technical, business, semantic, and operational metadata.

Data catalog, along with information on the data source and interactive data lineage visualization, empowers data scientists, engineers, and analysts with business context to drive BI, analytics, AI, and machine learning initiatives.

Purview helps companies to understand their data supply chain from raw data to business insights. From a Data lineage perspective, Purview currently supports:

  1. Scan your Power BI environment and Azure Synapse Analytics workspaces with a few clicks and automatically publish all discovered assets and lineage to the Purview Data Map.
  2. Connect Azure Purview to Azure Data Factory instances to automatically collect data integration lineage. Quickly determine which analytics and reports already exist without reinventing the wheel.

3. Purview Data Insights

Using Purview Data Insights, data officers and security officers can get a bird’s eye view and, at a glance, understand what Data is actively scanned, where sensitive data is, and how it moves

The data governance component provides users a bird’s-eye view of your organization’s data landscape; by quickly determining which analytics and reports are stored. It enables stakeholders to maintain and use an organization’s data efficiently if it exists already or not. This view allows you to get crucial insights such as data distribution across environments, how Data is being moved, and where sensitive data is stored.

4. Purview Studio

Purview Studio is essentially an environment created for you to work through the Azure purview services after creating an account. This studio is a central control area that allows developers, administrators, and end-users to work through Purview. This tool is the next step in the process of using Azure Purview.

Challenges of Azure Purview

Azure Purview is in its early days and has few gaps that need to be addressed. Here are few limitations of Azure Purview:

  1. Purview has a minimal list of data sources; even most Azure data services are not accessible for scanning, not to mention other extensive management systems and BI tools.
  2. User Interface is missing basic data management capabilities in the data catalog. For example, once classified, assets cannot be deleted with the UI.
  3. No support for the classification of zip file content.
  4. No support for Data Marketplace
  5. No support for automation and alerting
  6. Relations between assets are set manually, and it’s not possible to specify the type or nature of the relationship.
  7. The maximum length of an asset name and classification name is just 4 KB
  8. Currently, Azure Purview only provides you with 10GB storage capacity for four capacity unit platforms and 40GB for 16 capacity unit platforms.

While currently, Azure Purview is not a one-shop-stop solution for enterprise-level data governance capabilities but based on the roadmap shared, it won’t be long before the Purview team pull up their socks and cover enough to make Azure Purview an enterprise-grade Data governance suite.

How Azure Purview helps with Data as Asset

Azure purview is there to help you manage your data better and here’s how it’s going to help you process it and convert your data into an asset:

a) Inventory

Azure purview allows you to catalog your data and have a customized tag over it, allowing you, the end-user, to locate better and understand it.

b) Quality Control

It also helps you maintain Data Quality in situations where your data must be complete, unique, valid, accurate, consistent, relevant, reliable, and accessible. Governance tools such as the data catalog will help you with this.

c) Security Compliance

As an organization, it falls on you to provide the utmost security to end-user data. According to government laws and data mandates, the end-users can demand to remove their data from companies severs and even change its content at any given point; Azure Purview lets you create an automated process that will streamline these service requests and produce documentation required by the law.

d) Unified Roadmap

It provides a unified map of your data assets. This helps in forming an effective data governance system.

e) Provides Semantic Search Options

You can run searches based on technical, business, and operational terms. One can identify the sensitivity level of the data and can understand the interactive data lineage.

f) Constant Update of Data Running Through the System

Get continuous updates about the location of the data and continuous insight into its movement through your multi-layer data landscape. Along with this, Azure Purview provides you with services like a Data catalog and Business glossary.

g) Data Catalog

It is a core element of any data governance software, which can scan all the data sources, identify, index, connect and classify registered users’ data sets.

h) Business Glossary 

It is a collection of terms with brief definitions which connect to other terms. With Business Glossary, it’s possible to automate the process of classifying the data set and annotate them with correct business terms so end-users can understand them more simply. Any business glossary is the foundation of the semantic layer that an organization uses to define a medium of communication behind its business.

With features like these, Microsoft Azure Purview allows your data to become a crucial asset.

Summary

Data Governance is a must-have solution strategy for all enterprises to use Data as assets. Data Governance is a complex solution yet a foundational pillar in any enterprise’s data journey. Data governance helps to democratize data responsibly through accessible, trusted, and connected enterprise data at scale. 

Microsoft Azure Purview provides a good starting point for Cloud-native Data governance solutions. Azure Purview helps answer the who, what, when, how, where, and why of data. From the feature checkpoint of view of Azure Purview, I would say it has the potential to be a game-changer with features like Data catalog, Data insights, Data mapping, Business Glossary, Pipelines to manage your data sources and destinations. 

Azure Purview has a solid potential to shape up a new Data Governance as A Service Industry (DGaaS) and open up some new opportunities for businesses to explore.

By Gaurav Agarwaal

Keywords: Cloud, COVID19, Digital Twins

Share this article