Sia HackewrNoon

Synthetic Data as Infrastructure, Not Estimation

Introduction

This piece started from a series of conversations I kept coming back to over the past year around a simple but unresolved problem: how do we make data usable, safely, at scale? If data is the foundation of AI, then access to data is the real bottleneck. Much of that thinking has been framed under what is often referred to as Data Governance Transformation (DGT). Jacob Pasner has done a strong job categorizing this idea, but in practice it shows up under many different names across enterprises. The core of it is consistent: how do we make data usable, safely, in environments that are heavily regulated and increasingly driven by AI?

I first connected with Jacob at Georgetown University's Privacy and Public Policy Conference (PPPC), where we were both thinking about similar questions from different angles. At the same time, I’ve been spending a lot of time on this topic with researchers and practitioners, including work with Mayana Pereira and our recent talk at RSA on anonymization, differential privacy, and what that does to the power of synthetic data. Across all of these conversations, one theme kept showing up.

If you want to unlock data for innovation, especially for AI, you don’t start with models. You start with access.

DGT is fundamentally about that layer. It’s about how lineage, metadata, and policy automation can dramatically reduce the time it takes to access and use data. It’s about how privacy-enhancing technologies can allow us to safely expand where data can be used. And increasingly, it’s about how governance itself becomes part of the platform, not something layered on top of it.

Around the same time, I found myself going down a rabbit hole on the history of the American shopping mall. What started as casually watching a YouTube video turned into a deeper dive into the underlying economics, architecture, and policy decisions that shaped an entire system.

The story stuck with me because it felt familiar. The mall did not fail because people stopped buying things; it failed because it was scaled on a misplaced assumption about what it was. Synthetic data is at risk of the same misunderstanding. Synthetic data does not create new information. It does not increase the amount of information we have about the underlying problem. It does not improve estimation in isolation. What it does is expand the set of places where data can be used safely, quickly, and at scale.

That is infrastructure.

The Mall Was Never Designed to Be Retail

The modern shopping mall was not originally conceived as a retail machine. Victor Gruen, the architect credited with inventing the enclosed mall, had a very different vision. Influenced by European urban design, he imagined malls as community centers: dense, walkable, mixed‑use spaces that would recreate the social fabric of city life in the suburbs.

In his original concept, malls included apartments, schools, medical facilities, and civic services. They were meant to be “pedestrian‑friendly squares”: a substitute for the town center that suburban America lacked.

What got built was something else.

By the time Southdale Center opened in 1956, widely recognized as the first fully enclosed shopping mall, the concept had already begun to shift. Climate control, retail density, and foot traffic optimization became the dominant design goals. What remained was a highly efficient system for one thing: separating consumers from their money.

The Mall Scaled Because of Policy, Not Demand

To understand why the mall scaled the way it did, you have to look beyond architecture and into the underlying economic engine.

One of the most surprising insights from the research was that the rise of the mall was not purely driven by consumer demand; it was driven by tax policy.

The 1954 Internal Revenue Code introduced accelerated depreciation for income‑producing buildings. This allowed developers to write off large portions of construction costs early, creating substantial tax advantages. The result was a wave of mall construction that was not tied to organic demand, but to financial incentives.

Developers were incentivized to build quickly, on cheap land, and often with shorter lifespans in mind. The industry shifted toward what could be described as disposable infrastructure.

The outcome was predictable in hindsight:

Structural oversupply of retail space
Rapid suburban expansion
A system optimized for financial engineering rather than long‑term utility

At its peak, the United States had significantly more retail space per capita than any other country (see https://capitaloneshopping.com/research/mall-closure-statistics/; https://www.worldfinance.com/markets/the-rise-and-fall-of-the-us-mall).

The system worked: until it didn’t.

The Mall’s Product Was Always Access

Once that economic engine was in place, the mall’s success came down to a much simpler idea.

The mall did not win because it had better products; it won because it reduced friction. It centralized access to goods, enabled comparison shopping in a single place, created a safe and predictable social environment, and wrapped it all in climate‑controlled comfort. In doing so, it made consumption easier, faster, and more repeatable for millions of people. The real innovation was not what was being sold, but how easily people could engage with it.

It centralized:

Access to goods
Comparison shopping
Social interaction
Climate‑controlled comfort

All of this combined to create a predictable, repeatable environment where consumption could happen efficiently at scale. In that sense, the mall was not a retail innovation; it was an access innovation: an infrastructure layer that optimized how people discovered, evaluated, and purchased products.

The Mall Failed When a Better Access Model Emerged

That advantage, however, was not durable.

When e-commerce emerged, it did not immediately replace retail; it replaced the mall’s advantage. Convenience became effectively infinite.

Search, comparison, and purchasing moved from a physical, centralized environment to a distributed, on‑demand system. Even modest shifts in behavior were enough to destabilize the mall ecosystem because it was overbuilt and tightly coupled to foot traffic.

By the late 2010s, analysts were projecting that 20 to 25% of U.S. malls would close. The so‑called “retail apocalypse” was not the end of commerce. It was the failure of a specific infrastructure model.

The Mall Survived by Becoming Infrastructure Again

What’s interesting is not just that malls declined, but how the system responded.

The most interesting part of the story is what happened next. Malls did not disappear; they were repurposed. As retail demand shifted and the original economic model weakened, owners and municipalities began to reuse these large, centralized, climate‑controlled structures for entirely different purposes: often ones that aligned more closely with the community‑oriented vision Gruen originally had in mind.

Examples include:

Community colleges built into former malls (see https://perkinswill.com/project/acc-highland-campus-phase-ii/; https://siteselection.com/austin-community-college-from-shopping-mall-to-learning-destination)
Medical centers replacing anchor stores (see https://www.greshamsmith.com/projects/vanderbilt-medical-center-one-hundred-oaks-mall; https://healthcaredesignmagazine.com/news/hcd11-facility-tour-ultimate-recycling-project/7790/)
Mixed‑use developments with housing, education, and services (see https://www.cnu.org/what-we-do/build-great-places/micro-lofts-arcade-providence; https://www.countryliving.com/real-estate/a35148/shopping-mall-micro-homes-in-rhode-island/)

In many cases, these transformations brought malls closer to their original intent as multi‑purpose community hubs rather than single‑purpose retail environments. The physical structure: the centralized layout, shared infrastructure, and accessibility, remained valuable, but the use case evolved. The system did not disappear; it adapted to a different function.

Synthetic Data Is Being Evaluated on the Wrong Axis

This pattern should feel familiar.

There is a similar pattern emerging in how synthetic data is discussed. The dominant framing is still technical, focused on questions like whether synthetic data matches real data, preserves statistical distributions, or improves model performance. These are valid questions, but they are incomplete on their own because they assume synthetic data exists primarily as a modeling tool.

In reality, synthetic data does not exist to improve estimation; it exists to change the operating conditions under which data can be used. The real constraint in most organizations is not a lack of data, but the friction around using it: access latency, compliance restrictions, privacy risk, fragmented tooling, and governance bottlenecks. These constraints define how much of the available data can actually be used in practice.

In many cases, creating a training dataset is not a technical exercise but an operational one. It can take months, requiring navigation across multiple systems, approvals, and workflows. Teams often respond by building their own solutions to manage this complexity, which introduces additional risk, duplication, and inefficiency across the organization.

Synthetic data addresses this problem by enabling safe, reusable representations of sensitive data that can move more freely across environments. It does not increase the underlying information in the system, but it expands where that information can be used. In doing so, it reduces friction across the data lifecycle and makes it possible to operate at a fundamentally different speed.

Governance Is Becoming the Layer That Makes Access Possible

If synthetic data is about access, then the next question is: what makes access possible at scale?

This is where data governance transformation becomes central. Across conversations with research institutions, federal agencies, and groups like NIST, the direction is consistent: governance is moving from a set of policies and approvals to a set of capabilities embedded directly into the data platform. In practice, that means the controls that once slowed work down are being re‑implemented as systems that make work possible at scale.

Data governance is no longer just documentation or approval workflows. It is becoming an integrated layer that continuously ensures data is usable, safe, and compliant at the moment it is accessed and used. Instead of asking teams to navigate policy, the platform enforces it.

It ensures:

Data is fit for permitted use
Privacy is protected
Data is findable, accessible, interoperable, and reproducible
Compliance is enforced automatically

The emerging model is best understood as a pipeline:

Data generation → governance → access → innovation

In this model, governance is not a gate at the end of the process; it is the layer that makes the rest of the process possible. When governance is implemented as infrastructure, access accelerates. When it remains a manual overlay, access breaks.

Synthetic Data Only Makes Sense as Infrastructure

Once governance becomes a platform capability, the role of synthetic data becomes much clearer.

When synthetic data is embedded into this governance layer, its role becomes clearer. It is not just a dataset that can be evaluated in isolation; it is part of the infrastructure that enables data to move safely across environments and use cases. The key shift is from thinking about synthetic data as an artifact to thinking about it as a capability.

Synthetic data enables organizations to decouple access from sensitivity. Instead of requiring direct access to production data for every use case, teams can work with high‑fidelity, privacy‑preserving representations that are easier to distribute, test, and iterate on. This changes the speed and shape of how data is used across the enterprise.

It enables:

Safe data demotion from production environments
Faster experimentation in non‑production environments
Privacy‑preserving collaboration
Real‑time analytical workflows
Reduced compliance overhead

These capabilities align with a broader shift toward automated, computation‑driven systems where governance is enforced programmatically rather than manually. In that world, the limiting factor is no longer access approvals or policy interpretation, but how effectively the platform encodes and enforces those rules.

Synthetic data is one of the core mechanisms that makes that shift possible, because it allows organizations to expand access without expanding risk.

The Failure Mode Is Treating Synthetic Data as a Product

All of this leads to a very specific risk.

The mall failed when it became single‑purpose. Over time, it stopped being a broader social and access infrastructure and became narrowly defined as a place to buy things. Once that happened, it became fragile. When a better model for purchasing emerged, the mall had no flexibility left in the system to adapt.

Synthetic data faces a similar risk. If it is reduced to a single use case: whether as a modeling substitute, a compliance workaround, or a niche privacy technique: it will inevitably be evaluated on the wrong axis and underperform expectations. That framing narrows its role too early and ignores where the real leverage exists.

The value of synthetic data is not in replacing real data or competing with it on fidelity alone. The value is in expanding where and how data can be used safely and efficiently across the enterprise. When viewed through that lens, synthetic data becomes part of a broader system that enables access, experimentation, and collaboration at scale, rather than a standalone artifact that must prove its worth in isolation.

The Lesson Is About Infrastructure, Not Data

At a high level, the lesson is straightforward.

The mall didn’t fail because people stopped buying things; it failed because it was scaled on a misplaced assumption about what it was. We treated it as retail when, in reality, it was infrastructure. Synthetic data sits in a similar position today. If we treat it as a modeling technique, we will be disappointed. If we treat it as infrastructure, it becomes one of the highest‑leverage systems we can build. That is the shift: not from real data to synthetic data, but from restricted data systems to usable ones.

Why the Mall Failed: and What It Teaches Us About Synthetic Data