Data Governance’s New Clothes

When citizens, consumers and stakeholders can’t hold institutions accountable for their promises, there’s little reason to trust those promises.

July 5, 2021

The “governance” of data, without real rights, is an embarrassing illusion.

Recently, the United Kingdom’s National Health Service (NHS), often described as that country’s pride, received legal and press blowback over a policy to share patients’ personal health information with third parties, including Palantir, a controversial American technology company. The move was concerning enough to inspire a lawsuit, months of inquiry-based delays and a protest from doctors. But, perhaps more worryingly, the UK Government described its need for trust and adherence to data laws, while forcing patients — likely illegally — through a convoluted ‘opt-out” process, rather than obtaining affirmative consent.

The move significantly centralizes decisions about the governance and rebrokerage of data collected by UK health services. It creates a new locus of potential power and, more concerningly, it’s something the Tory British government has been trying to bring about for the past eight years. Originally, it was called It failed back then because it couldn’t gain the public’s trust. But things have changed. In 2021, with emergency powers in force and a global pandemic as cover, Westminster is more inclined to feign apology than ask for permission.

We should recognize at the outset that data governance is like any design — all data is governed. What differs, from case to case, is how thoughtfully, equitably and accountably this is done. An easy way to predict the popularity of a new governance trend is to look for solutions that seem like justice, and sound like justice, but don’t meaningfully account for politics or power. The exploitation of data architecture to maximize private influence over public services, and their digital footprint, is not only growing during the COVID-19 pandemic; it’s making it increasingly obvious how easily supposedly benign plans for managing data can be misused, under the guise of conferring a public benefit or social good.

Whether these initiatives are focused on “open source,” or “tech for good,” or “digital public infrastructure,” beneath the hood, they are usually drawing on publicly privileged resources and invested in making a set of services, tools or access rights available to a small group of already-privileged people, without meaningfully addressing the relative cost to those the technology leaves out. That isn't very "good" at all.

And this cleavage is at the root of an important and growing rift in the field of data governance, one rapidly embedding itself in all the various systems evolving to rely on digital tools and data. As a result, the digital transformation projects that rely most on public legitimacy and trust are often betraying that trust. They offer public assurances of abstract privacy and security protections, while developing no meaningful path to enforcement or justice. The inevitable outcome is fragility and injustice when such systems turn out to be flawed.

In practice, the rift begins with problem definition. There are, broadly, two animating frames for data governance projects. The first is data value maximization: “We are making a lot of data — how do we maximize its value?” The second is rooted in solving a problem in context: “Can we help solve a problem for a group of people with data or technology?” Value maximization and contextual problem-solving approaches to data governance are inherently political, but they measure and perform their politics very differently. Value maximization incentivizes first-movers and broad problem definitions, while minimizing the costs of due diligence, rightsholder participation and accountability. By contrast, contextual problem-solving approaches tend to incentivize due diligence, working with rightsholders, and accountability – all of which are costs, if not barriers, to data reuse.

And, as a result, value-maximization approaches prefer broad, often infinite problem definitions. Broad problem definitions ostenstibly grant broad authorities (whereas concrete problem definitions typically create clearer and firmer limitations on the justifiable data use). The NHS explained, for example, that its data grab is to “save lives,” which is an impossibly broad and endlessly reusable purpose. Contrast the breadth of purpose with something like “develop treatments and cures for COVID-19,” which is a more specific pursuit, with clearer boundaries, indicators for success, and – to an extent – logic for prioritizing resources.

The common response to this dichotomy? “We can do both!” And that’s true — but not within the same systems, or at the same time.

The reason is that data maximization and contextual problem-solving clash in their basic operational principles and design. When digital transformation is justified by appeals to vaguely defined values (i.e., “open,” “public interest,” “for good”), rather than by a clear articulation of the problem it purports to solve, and for whom, it can provide political cover for the plan’s architects and their allies to self-deal. And while the systemic oppression of the underserved by the connected, via technocratic policy delivery, is of course not new, technology, as they say, amplifies everything. Governance is supposed to insert friction, deliberation, transparency and validation into systems that are fundamental to our well-being — a function we can only hope works in most places, most of the time, despite the evidence.

Data governance systems that ignore the politics and context of their data’s use are weaker and less effective and, ultimately, suffer. Think about it as a basic maxim: Any data system that works well enough to create commercial and political value will also acquire adversarial attention (people trying to game the system). Data governance designers with a vested interest in their system’s success try to anticipate what adversaries might do, in order to build resilience and scalability — which requires understanding their motives. If you don’t understand the context of your data’s likely use, it’s significantly more difficult to predict how people will exploit your system, or to build the protections you’d need to stop them. At present, rather than build strong systems, most data governance works to be as unaccountable as possible, while company leadership repeatedly, performatively apologize for predictable failures.

A core foundation of any trust is accountability. Put simply, if citizens, consumers, stakeholders can’t hold institutions accountable for their promises, there’s little reason for them to trust such promises, ever. And the reality is that most institutions holding troves of data lack the leverage, jurisdiction, mechanisms, capacity and tools to enforce most digital rights. Without enforcement, commitments are empty words. Even if existing institutions were enforcing their laws, it wouldn’t be enough. New uses of data and technology create new problems. We cannot safely adapt to a future without first considering the protections we may need. Right now, that’s not only made harder, but often proactively prevented, by the ways we design the governance of data.

Building pathways for users to constructively participate in adapting and improving a system over time is, basically, the point of democratic governance. Unfortunately, digital systems are still so speculative and focused on short-term capture, they rarely invest in meaningful governance – and, as a result, we’ve seen years of continuous scandals and a race to reign in exploitation and abuse. While there are many productive ways forward — from building due diligence into digital procurement, to political resilience in experimentation processes, to dispute resolution — the problem isn’t the lack of options. It’s the absence of political will to build accessible digital rights.

In this article, I will outline the basic elements of data governance design to articulate the operational differences between value maximization and contextual problem-solving as approaches to data governance, with a view to better protecting user rights.

There’s actually no consensus on what “data” is, despite there being a deep literature comparing data, digital transformation and the related impacts on everything from climate change, to the future, to fascism.

Data as Fact versus Representation of Fact

To understand how we got here, it’s helpful to go back to first principles. There’s actually no consensus on what “data” is, despite there being a deep literature comparing data, digital transformation and the related impacts on everything from climate change, to the future, to fascism.

These analogies are apt, not because of what data is, but because of how it is allowed to be used, and the attendant impacts on the justice of that use. Here the fundamental, definitional issue is whether a system treats gathered data as unquestionably true or as a fallible representation — created in and for a context. In other words, do data governance systems believe everything they read on the internet? And if so, why should we believe what they tell us or that they are serving the public interest? The answers to these questions get to the foundation of how digital systems define data inputs to form “truth.” Whose truth a digital system prioritizes has a big impact on who it serves, and how. And, understandably, on who trusts and participates in it.

Digital systems that don’t anticipate bad data or adversarial attention typically fail to build the kinds of governance mechanisms and processes necessary to mitigate their impact. Take, for example, “AI” products such as GPT-3 and Microsoft’s Tay – abused, gamed, and ultimately shutdown for offensive, racist content. While there are many examples, these problems are evident in the designs of digital systems (seen in the rise of supply-chain attacks and ransomware), as well as in the failure to provide adequate redress when things go badly wrong (such as when discriminatory bail recommendations or illegal extensions of jail sentences occur because of poorly maintained software).

As a result, the easiest way to identify data governance systems that treat fallible data as “facts” is by the measures they don’t employ: internal validation; transparency processes and/or communication with rightsholders; and/or mechanisms for adjudicating conflict, between data sets and/or people. These are, at a practical level, the ways that systems build internal resilience (and security). In their absence, we’ve seen a growing number and diversity of attacks that exploit digital supply chains. Good security measures, properly in place, create friction, not just because they introduce process, but also because when they are enforced; they create leverage for people who may limit or disagree with a particular use. The push toward free flows of data creates obvious challenges for mechanisms such as this; the truth is that most institutions are heading toward more data, with less meaningful governance.

One of the primary values of Europe’s General Data Protection Regulation (GDPR) is that it compels organizations to create and consider the supply chain of data prior to use, as well as the roles of each link in the chain. The technical and legal approach to building GDPR compliance, as a result, has motivated digital platforms to build the diligence, documentation and practical infrastructure needed for documented data supply chains. Data protection’s (arguably) largest failure, however, is that it too often protects the integrity of data systems but not the people or the purposes these systems are supposedly operated to serve. Data protection has helped build important data governance infrastructure, doesn’t provide for meaningful accountability based on data’s use, or its abuse.

Focusing on data as a constructed representation, as opposed to an objective truth, however, forces a data governance system to recognize the context of the decision to use the data. If data is a fallible representation of reality, it must be created or constructed by sometone or something, and then shared with someone or something else — ostensibly in service of an articulable purpose. Systems that track that context not only factor in the context of data’s technical creation and exchange but also its fitness for purpose.

One of the tricky parts about the idea of a “representation,” like any statement, is that it happens at a moment in time, like a transaction, but then (especially in digital and legally significant contexts) remains as an ongoing testament to that fact. But facts change, according to context and over time. So the longer we rely on that representation — the more we reuse it — the less reliable its fitness. Systems that recognize data as representations not only help track the technical provenance of data, but are also designed to understand and minimize the costs and liabilities that come from reusing data, acontextually.

Moreover, systems that recognize data as a representation not only track the provenance of data to ensure compliance with applicable law, but also build in due diligence that can track the rights and limitations attached to data. The most obvious examples, often taken to the extreme, are copyright and trademark protections of artistic works. While they can certainly be weaponized, digital content markets and protections are an early model for what it means to acknowledge, track and fairly bargain for the rights attached to a digital representation. Digital content markets are designed to honour the rights attached to content, both because they create revenue for institutional interests and because they have been a frontier for defining “ownership rights” for years. Systems that focus on digital content not only respect rightsholders; they create accountability for how people make representations about people’s rights and agency — regardless of whether they are aware, affected or able to seek redress. Data governance’s broad failure to adhere to that same standard is common for digital systems, but it’s a newly lowered standard of practice for systems with public authority.

Historically, if you wanted to represent a person’s interests before a government institution, you’d have to meet an elevated standard of accountability — both to your profession and, directly, to the people you serve.

User versus Representative

The biggest difference between data governance systems that treat data as fact, versus as a representation, is that they don’t account for the appropriateness of the representative. Historically, if you wanted to represent a person’s interests before a government institution, you’d have to meet an elevated standard of accountability — both to your profession and, directly, to the people you serve. In a practical sense, you’d either need their vote; a legal mandate, certification and education; or a willfully entered contract. And the majority of technology platforms today would argue that they have at least the latter. Terms of service and privacy policies are notorious for creating broad data reuse permissions and even, in some cases, direct “consents” around using a data subject’s likeness.

But, rather than create means of accountability, most technology platform contracts maximize the ability to make affirmative representations about a person to another, by framing these as “data” behaviours, and avoiding most meaningful authority or accountability to the person they’re purportedly representing. Some of the most prominent examples are emotional facial recognition systems and contact-tracing apps — both technologies that communicate inferences about their user without evidence or accountability for their underlying efficacy. In other words, data governance systems that focus on data maximization do so by avoiding the political and legal questions around the data user’s legitimacy in context — whereas systems that focus on contextual problem-solving are chiefly concerned with the integrity of the relationships, which confers legitimacy to representation or representative.

For example, a number of institutions and organizations have used a range of office technologies to deliver mutual aid and connect people with vaccination appointments, as part of COVID-19 relief. Data has played a role in the deployment of these technologies, but so have direct human engagement, public capacity-building and available care infrastructure, all of which are focused on public health, not on the technology.

Perhaps one of the strangest aspects of technology’s high modernism is that the possession of data, regardless of its quality, has become a substitute for professional integrity. High modernism is, broadly, the belief that science and technology will independently reorder the world — and it’s increasingly clear that there are large interests ensuring that data, whether by volume or political choice, does the same. Companies such as Google and Apple have made large investments, for example, in becoming medical technology providers — not because of their expertise in medicine, but because of their ability to collect and sell large amounts of data. And medical systems have been, by and large, willing, if not enthusiastic, consumers.

Whereas most professions acknowledge a responsibility to represent their clients’ interests, there is no similar or obvious designation for the legitimacy of a data user. The term data use itself is an abstraction so dangerously vague that it almost willfully obscures any usable information about an action. But more often than not, data is used in a context — often in ways that influence or inform decisions with significant implications for the rights of others.

That is, people and organizations often use data as an affirmative representation in ways that impact people’s agency. So, for example, when urban planners use digital tools, such as cellphone records, to model human mobility patterns, rather than directly engage with communities affected by a proposed development, they both accept a number of assumptions about that data’s relevance and remove an important component of civic education and participation. The degree to which a system relies on data collected out of context, instead of having active relationships with the people whose data is at issue, tells the tale. Data, in individual bits and in the aggregate, is often a statement intended to reflect a fact about a person, in a context that affects their legal rights.

Of course, there are rules, cultural norms and practices, and laws (and there should be more of all these) that dictate the kinds of representations and representatives that are allowed to influence people’s rights in various contexts. And virtually none of these allowable representations and representatives are “I found it on the internet.”

The GDPR has two primary pathways to legal data sharing: legitimate purpose and consent. Legitimate purpose means that an actor must have a good reason for using data. Consent means that someone with rights to the data agreed to share it. The obvious problem with both pathways is that it’s extremely difficult to negotiate for, let alone monitor or enforce, reasonable limits on data reuse. As a result, most digital services either refuse service if users don’t agree to exploitative practices, or find new ways to work around the compliance requirement. By contrast, Apple recently updated the terms of its app store, requiring apps to obtain affirmative consent to collect or share their users’ data. Ninety-six percent of users opted out of tracking. But for the apps that received their users’ consent, there is no continuing oversight of their data use or reuse. In any case, neither unchecked data sharing nor completely failed trust ecosystems are an ideal outcome. Data governance approaches that centre on data use but don’t create models for appropriate, contextual relationships between data users and rightsholders typically polarize data sharing by forcing binary approaches. Ultimately, the integrity of the underlying system erodes.

Likewise, data governance systems that focus on maximizing the value of data tend to prioritize reuse of that data over building process that ensure its use is appropriate. Put another way, it is significantly harder to earn the qualification and relevance to make professionally appropriate representations on behalf of a person than it is to acquire the data necessary to do so. Data governance systems that enable widespread data collection and focus on maximization through reuse typically do so by ignoring that basis of expertise, oversight and accountability in favour of increased scale. The result tends to be less expert, less accountable systems.

By contrast, data governance systems focused on solving a problem in context are, typically, able to establish the relative legitimacy and authority of a data user and a representative. These systems also recognize that problem definition has a lot to do with the legitimacy of a representative and a data point. We trust surgeons to alter our physical bodies and lawyers to maintain our freedom, for example — but we would never (I hope) allow them to do each other’s jobs. That’s because expertise, as one basis of sometimes-legitimate representation, isn’t generalizable. Nor is the data that you’d collect in either context, despite the fact that they’re both highly regulated, albeit in very different ways.

Most societies that entrust the management of a person’s fundamental rights to a professional apply significantly heightened standards for oversight, professional certification and duties to the person or people represented. If, for example, you take on the guardianship of a child, the representation of someone in court or provide people with medical care, governments typically create heightened, if not fiduciary, duties. Nearly every mature profession built on significant asymmetries of information, access, expertise and/or power has higher standards than those involved in data governance.

Data governance approaches that focus on equitably solving a problem recognize the two additional elements of mature professional services; the duty to loyalty and the duty to operate with transparency and accountability.

Standards versus Duties

A primary difference between the two data governance approaches, in practice, is how the systems that implement each define their responsibilities. Value-maximization approaches are, typically, rule-based — essentially, “if our work meets this set of standards, we are free to reuse data with minimal friction or ongoing requirements.” Data governance that focuses on equitably solving a problem, by contrast, governs through transparency, participation and accountability. Put simply, data governance aimed at solving problems accepts and acknowledges the professional duties that define most mature, representational professions. It’s worth acknowledging up front, as legal scholar Julie Cohen has noted, that duties do not scale. Whereas standards-based approaches enable data users to manage their liability by meeting a set of fixed conditions, duties to a fixed and specifically aligned set of interests are often costly sources of friction.

There are a significant number of public and private data governance systems whose approach is to pool data and/or the associated rights of the people who are potentially representable through that data, in order to minimize the friction involved in aligned reuse. The goal of standards-based approaches, then, is to claim a person or organization is a legitimate user of that data, implying that they also have a legitimate basis to make such a representation. Once the underlying conditions are fulfilled, there are relatively few limitations on their ongoing reuse of that data, and even fewer that are exercised in practice.

The problem is that, as described above, valid data and legitimate right of representation are just the tip of the iceberg. Professional standards are a starting point for high-impact professions, but they are not a substitute for responsibility for the impact of one’s actions. Data governance focused on value maximization thus runs the risk of using standards of care performatively, while doing legal backflips to avoid meaningful liability.

There is a large and well-funded lobby, especially in the technology sector, that aims to frame fundamental rights protections around standards. Not only do technology companies advocate for this approach, but software systems operate through primarily rule-based logics. This is especially true for digital rights protections, which can be over-implemented through technolgoy, in ways their framers did not intend. Take, for example, the example of the EU Copyright Directive, initially designed as a way to ensure that platforms work to prevent infringement, and implemented through filters that are sometimes weaponized” through their inability to interpret context. There are cases where police officers have turned on copyrighted music while being filmed as a tactic to prevent cellphone video of police encounters from being uploaded to internet platforms and going viral, in the belief that the automated copyright filters will prevent their spread.

Data governance approaches that focus on equitably solving a problem recognize the two additional elements of mature professional services; the duty to loyalty and the duty to operate with transparency and accountability. Ultimately, standards of care are only one necessary-but-not-sufficient component of the standards we expect of high-impact professions. A duty to loyalty means the service provider makes decisions that are explicitly, and exclusively, in their clients’ best interests. That duty doesn’t mean best interests in a broadly construed “public benefit” sense or “as-a-collective” way; it means avoiding conflicts of interest, especially those that benefit the service provider, and serving that specific individual’s interests in context. It also means that providers can face liability based on the impact of their choices. So when a decision turns out to cause harm, the provider bears the burden of proving the mistake was unknowable, or made in good faith.

Similarly, most professions that involve others making decisions on a person’s behalf bind their members to the duty of providing ongoing transparency and seeking specific, informed consent when making critical decisions. In other words, data governance approaches intending to model professional integrity of representation are designed to proactively support participation from and accountability to the people they serve. Much like the duties in professional standards, that approach to data governance system design doesn’t scale particularly well. And data, in small amounts, isn’t particularly valuable, especially if it is governed on the basis of people’s interests rather than on its availability to others.

Here’s where this leads: A group of people represented in a data set in one context, such as patients contributing to research for a medical condition they share, may be competitive in a similar-seeming context, such as getting access to new treatments. So, a data governance system that justifiably represents all of their interests when they are aligned, is fine. But if that same system tries to re-use data in another context, they may have serious conflicts of interest. The rights around brokering the same data set, representing the same group of people, could be dramatically different from case to case, and may involve introducing new political concerns, such as prioritizing access based on urgency, need or — as happens more often — the ability to pay for service. And so, data governance initiatives that focus on equity must find a balance between the interests of the individual or group, the legitimacy of the representation of those interests, and their ongoing accountability for the impacts of the data’s use.

As any lawyer is legally required to tell you, even the best representatives can’t promise outcomes. In the American legal tradition, there’s an unfortunate saying that “a good lawyer knows the law, a great lawyer knows the judge.” And, in data governance, there’s a similar analogy: a good advocate can help you defend your rights; a great advocate will help you pick the systems that prioritize your rights by looking at how they determine value.

There is not a single system, of any type, in the world that reaches every human being. Any attempt to use data, whether to create financial value, improve the operating scale of a service, or identify and solve important problems, is also an exercise of power that distributes unevenly.

Value and Adoption versus Politics and Impact

Data governance systems have different priorities, which lead to disparate measures of value, relevance and impact for the use and reuse of data. Arvind Narayanan, a Princeton computer scientist, has illustrated some of these in his famous lecture “21 fairness definitions and their politics.” For commercial data brokers, value maximization, in its most literal sense, is the only and defining goal — whereas public institutions typically frame value maximization around “the data economy” and the growing of industry. By contrast, efforts to govern the equity of data’s impacts rarely have well-defined metrics for success. But they do have clear and contextual logic for their decisions. Value maximization embeds valuation logics, which are political and inequitable, especially when prioritized over the direct participation of the people involved.

To get one, hopefully obvious, point out of the way: there is not a single system, of any type, in the world that reaches every human being. Any attempt to use data, whether to create financial value, improve the operating scale of a service, or identify and solve important problems, is also an exercise of power that distributes unevenly. Often this happens in ways that accelerate the already historic rate of wealth and power centralization taking place in economies all over the world.

Among the most prominent arguments for digital transformation and data governance are those that emphasize efficiency, namely, ensuring “the public” gets “value for money” out of its digital investments. In the United States, one oft-cited motivation for making landmark public investments in artificial intelligence is to “beat China.” At the same time, ransomware attacks on public infrastructure, such as the 65,000 reported against American infrastructure in 2020, are becoming so frequent that insurers are refusing to cover them.

The fact is that data systems are deeply, practically and functionally political. And that means the most important elements of our era-defining politics are obscured in technological abstraction. What does it mean to “beat” another country in the development of a definitionally ambiguous technology? Why would we invest in doing that over, say, providing health care or education? Who are digital public investments serving? Whose needs are prioritized in digital transformation initiatives, and what balancing logic or policy can be applied to bridge the divide for those they ignore? What does it mean to create novel technology but then make no investments in the justice systems to provide citizens with protections from the novel harms this technology is likely to create?

Recognizing that these questions transcend their digital manifestations, most data governance systems are left to decide whether to engage with such questions at all. And, just as in analog governance, there are “benevolent” systems that abet structural autocracy and participatory systems that focus entirely on upholding the status quo. Data governance systems, like all systems, are faced with difficult decisions about whether, let alone how, to balance the value and costs of engaging with their politics.

To conclude: Using data as a representation of a person is an activity in fundamental tension with that person’s interest in representing themselves. While there are tools available to ensure the quality, legitimacy and relationships that underpin equitable representation, virtually none of them are present in modern data governance — certainly, not enough to justify the use of governance as a term. The tools and practices of maintaining equitable relationships, especially across scale, speed and value exchange, are an inherent source of friction and cost. Attempts to maximize any perceivable value of data through reuse pit short-term incentives against the slow, expensive and human process of building relationships.

And so, rather than negotiate those relationships with integrity, governments, technology companies and lawyers have leveraged the asymmetrical tools at their disposal, whether these be policy, centralized platforms or exploitative contracts, to dictate the terms of those relationships. No matter the nomenclature, nor the technology, the question that fundamentally determines the equity of a data governance system is what entitles a data user to make a representation on behalf of the person or people affected. Data governance systems are, in the aggregate, what happens when speculative value comes into tension with the integrity of all of our relationships. Choosing value maximization as the defining purpose of all those relationships is, at the very least, a nakedly political choice — and one that no amount of professed equity or digital abstraction can paper over.

Today, data governance decisions are as often acts of omission as acts of commission. But the people affected are under no illusion that their interests are being championed. And, perhaps as importantly, the people designing digital transformations are beginning to recognize the long-term costs of using data governance as political cover, one embarrassing disclosure at a time. It’s high time for policy makers, platform designers and data governance professionals to be honest about the impacts of digital transformation — and to stop using the mirage of equity as data governance’s new clothes.

The opinions expressed in this article/multimedia are those of the author(s) and do not necessarily reflect the views of CIGI or its Board of Directors.

About the Author

Sean Martin McDonald is a CIGI senior fellow and the co-founder of Digital Public, which builds legal trusts to protect and govern digital assets.