And Why Governance is the Gordian Knot to all Your Business Problems

"Data governance is a measure of a company's control over its data"

Data governance is a data management concept. It is a measure of the control a company has over its data. This control can be achieved through high-quality data, visibility on data pipelines, actionable rights management, and clear accountability. Data governance encompasses the people, processes, and tools required to create consistent and proper handling of a company's data. By consistent and proper handling of data, I mean ensure availability, usability, consistency, understandability, data integrity, and data security.

The most comprehensive governance model — say, for a global bank — will have a robust data-governance council (often with C-suite leaders involved) to drive it; a high degree of automation with metadata recorded in an enterprise dictionary or data catalog; data lineage traced back to the source for many data elements; and a broader domain scope with ongoing prioritization as enterprise needs shift.

Good data governance and privacy model is a mix of people, processes, and software.

Data Governance has a Direct Business Impact

Data governance isn't just that rusty process that companies have to deploy in order to comply with the regulation. Of course, part of it is a legal obligation, and thank god, but clean governance can have high business outcomes.

Here are the main goals of data governance:

When Did Data Governance Become a Thing?

Timeline and key milestones in the space.

For the past twenty years, the challenge around data has been to build an infrastructure to store and consume data efficiently and at scale. Producing data has become cheaper and easier over the years with the emergence of cloud data warehouses and transformation tools like dbt. Access to data has been democratized thanks to BI tools with BI tools like Looker, Tableau, or Metabase. Now, building nice dashboards is the new norm in Ops and Marketing team. This gave rise to a new problem: decentralized, untrustworthy & irrelevant data and dashboards.

Even the most data-driven companies still struggle to get value from data - up to 73% of all enterprise data goes unused.

→ 1990-2010: the emergence of the 1st regulation on data privacy

In the 1970s, the first data protection regulation in the world was vetted in Hessen, Germany. Since then, data regulation has kept increasing. The 1990's mark the first regulations regarding data privacy with the EU directive on data protection.

Yet, compliance with regulation really became a worldwide challenge in the second half of the 2010s with the emergence of GDPR, HIPAA, and other regional regulations on personal data privacy. These first regulations drove data governance for large enterprises. This created an urgency to build tools to handle these new requirements.

→ 2010 - 2020: 1st tools to comply with the regulation. C-level realizes data governance becomes a strategic advantage to drive business value.

With the increasing complexity of data resources/processes on the one hand and the first fines for GDPR infringement on the other, companies started to build regulatory compliance processes. The 1st pieces of software to organize Governance and Privacy were born with companies like Alation and Collibra.

The challenge is simple: enforce traceability across the various data infrastructure in the company. Data governance was then a privilege of enterprise-level companies, the only ones able to afford those tools. On-premise data storage makes it expensive to deploy these software. Indeed, companies like Alation and Collibra had to deploy technology specialists on the field to connect the data to their software. The first version of data governance tools aims at collecting and referencing data resources across the organization's departments.

There were several forces at play in this period. It became easier to collect data, cheaper to store it, simpler to analyze it. This led to a Cambrian explosion of the number of data resources. As a result, large companies struggled to have visibility over the work done with data. Data was decentralized, untrustworthy & irrelevant. This chaos brought a new strategic dimension to data governance. More than a compliance obligation, data governance became a key lever to bring about business value.

→ 2020+: Towards an automated and actionable data governance

With the standardization of the cloud data stack, the paradigm changed. It is easier to connect to the data infrastructure and gather metadata. Where it took 6 months to deploy a data governance tool on a multitude of siloed on-premise data centers in 2012, it can take up to 10 minutes in 2021 on the modern data stack (for example: Snowflake, Looker, and DBT).

This gave rise to new challenges: automatization and collaboration. Data governance on excel means maintaining manually 100+ fields, on thousands of tables and dashboards. This is impossible. Data governance with a non-automated tool means maintaining 10+ fields on thousands of tables: this is time-consuming. Doing data governance with a fully automated tool means maintaining 1 or 2 fields only on thousands of tables (literally table and column/field description). For that last part of manual work, you want to leverage the community. Prioritize work based on data consumption (high documentation SLA for popular resources) and democratize usage through a friendly UX.

Additionally, you want that data governance tool to be integrated into the rest of the data stack. Define something once and find it everywhere: whether this is a table definition, a tag, a KPI, a dashboard, access rights, or data quality results.

Data Governance Challenges Are Not the Same for Everyone

Diverse governance's use-cases based on industry needs and company size

There are two main drivers for data governance programs:

The level of complexity increases with the scope of business operations (number of lines of business and geographies covered), the velocity of data creation, or the level of automation (decision-making, processes) based on data.

How Do You Set Up Good Data Governance and Privacy Model?

Several bricks are needed to enforce data management.

Where Does Data Governance Fit in the Modern Data Stack?

Data governance brings trust from the raw data sources to domain expert dashboards.

The typical data flow is the following :


Also published at: https://www.castordoc.com/blog/what-is-data-governance-and-privacy.