In a big data environment, the public interest must be front and centre. (Shutterstock)
In a big data environment, the public interest must be front and centre. (Shutterstock)

In 1988, the Supreme Court of Canada heard an appeal in a criminal case in which the accused had been charged with theft of a list of names and contact information. In acquitting him, the Supreme Court ruled that information could not be stolen; the crime of theft required a person to take something away from another person. In this case, one party acquired information held by another without actually depriving them of it. The decision highlights some of the challenges raised by thinking about ownership of information or data.

Thirty years later, there is a surging interest in the important questions around data ownership in today’s big data environment. Discussion of data ownership rights has tended to come from two directions. Companies that acquire or generate data claim proprietary interests in order to support its commercialization through licensing and to protect against its use by competitors or others. At the same time, individuals, increasingly concerned over the use and abuse of their personal information, have also started to turn to concepts of ownership, in search of greater control. Start-up companies now offer individuals ways to monetize their personal information by selecting what information they are prepared to share in exchange for payment.

Any discussion on data ownership should begin with three basic questions. First, is the right to data ownership necessary or desirable? Those seeking to control or monetize data want rights, but society may be better off — economically, socially and politically — if data and information are free. Second, do existing laws already provide ownership rights in some form and in some circumstances? They do, but the protection is patchwork and uncertain. Third, if there is some form of data ownership right, what are its limits? Limits are almost as important as the concept of ownership itself. Even owning something as “commonplace” as real estate does not give the owner the right to do anything she pleases with her property. Limits are the means by which the public interest is balanced against private rights.

Balancing Public Interest with Private Rights

Intellectual property law is specifically tailored to create rights in intangibles, and it is fair to ask whether and how data can be protected by copyright law. It is a fundamental principle of copyright law that there can be no copyright in facts or ideas. Both are in the public domain and must be available to all, in order to avoid the stifling of innovation and creativity. Only original expressions of facts or ideas can be protected in copyright law.

There has been some case law — predominantly in the United States — in which plaintiffs have argued that, while facts are in the public domain, data is capable of being a “work” in which copyright subsists. For example, one might argue that data is the result of complex calculations or algorithms, such that resulting data points are not facts; rather, they are authored bits of information. Courts in the United States have indicated that such arguments may have some merit; nevertheless, they have also applied the “merger doctrine” in such cases, finding that even if the resulting data are authored, the ideas they represent have so closely merged with their expression that to enforce any copyright would be to give the plaintiff a monopoly over the underlying idea.1

While it is possible for clever lawyers to build arguments for why data itself can be protected under copyright law, the public policy reasons for why facts and ideas are in the public domain are important. Although there is a public interest in recognizing and protecting original works (to provide incentives to create and to disseminate those works publicly), there is also a public interest in ensuring that no one person or company obtains a monopoly over the building blocks of innovation and creativity — facts and ideas. In a big data environment, this public interest must be front and centre.

Copyright Is Contingent and Uncertain

Although individual data may not be protectable under copyright law, copyright law does recognize the possibility that a compilation of data (including a data set) could be an original expression of that data. This means that many compilations of data could be protected under copyright. However, to infringe such a copyright, there would have to be a substantial taking of the original expression — in other words, of the selection or arrangement. If the only original element is the arrangement, someone who takes all the data and arranges it differently will not have infringed copyright. This means that copyright in any compilation of data, as well as its scope, is both contingent and uncertain. Unfortunately, in a dispute, uncertainty favours the party with the deepest pockets, since the existence and scope of rights can only be conclusively determined through costly litigation.

Ownership Rights Are Never Absolute

Even if copyright subsists in a compilation of data, it would be subject to a range of users’ rights, the most important of which is fair dealing (or fair use, in the United States). Users’ rights are meant to balance copyright with the broader public interest, including research, criticism or comment, education, and news reporting. In the United States, the even broader concept of fair use recognizes in addition the importance of allowing transformative uses of protected works.

Successfully asserting users’ rights, however, may depend on the individual user’s ability to bear the costs of litigation. Users’ rights may also not be sufficiently broad to facilitate the kind and scale of innovation that is desirable in the big data and artificial intelligence (AI) environment. Current debates over whether text and data-mining activities infringe copyright illustrate the implications of excessive intellectual property protection for rapid and free-flowing data-related innovation.

Claims to copyright in data or data sets raise two additional challenges that are part of a legacy of copyright policy making that has paid too little attention to the importance of users’ rights in fostering innovation and creativity. Canada’s Copyright Act was amended in 2012 to make circumventing “technological protection measures” (TPMs) a violation of copyright law. If a work is protected by a TPM (which could include a simple password or encryption), any circumvention of that TPM breaches the owner’s rights, even if the goal was to accomplish a fair dealing purpose such as research. The use of even a simple TPM to protect a compilation of data can thus completely disrupt the copyright balance by eliminating users’ rights.

The second issue is that nothing in the Copyright Act provides that users’ rights cannot be eliminated or diminished by contract. In other words, a copyright owner can prohibit fair dealing by contract. In a digital environment, in which works are increasingly subject to non-negotiable licences, this can also dramatically alter the copyright balance. When it comes to publicly accessible data on the internet, the tendency of courts to find website terms of use to be binding means that such terms can prohibit data scraping and thus turn the harvesting of publicly accessible data into a breach of contract.

“Ownership” of Confidential Information

Another form of protection for data is found in the law of confidential information. Information that is kept confidential can be protected in law, although the basis for such protection is not property rights. Rather, the law protects the relationships that give rise to obligations of confidentiality. Information assets that can be protected as confidential information include customer lists, trade and commercial secrets, recipes, formulae and inventions. Algorithms can also be protected as confidential information, as can data assets.

There are many instances in which there may be a public interest in either the disclosure of confidential information or, at least, in government oversight or review of such information. In the big data and AI context, these may include instances where it becomes necessary to understand what data is being used to reach automated decisions or what processes are being used to arrive at those decisions. Laws can create exceptions to confidentiality where there is a public interest in doing so.

As the need for governance of algorithmic decision making evolves, so too may the need for legislated exceptions to provide access to or oversight of confidential data and algorithms. This area will demand more government attention in the future.

The Limits of Personal Data Ownership

It is increasingly common to hear people talk about “owning” their own personal information. Ownership certainly seems to be one paradigm that might allow individuals to assert control over their personal information, and even to demand compensation for its use in a range of circumstances. However, there is little to support such a right of ownership in our current legal regime.

Consent-based data protection laws give individuals the right to make choices about the collection, use and disclosure of their personal information. However, this right falls far short of an ownership interest. Canada’s Personal Information Protection and Electronic Documents Act was enacted to balance the rights of individuals to exercise some control over the collection, use and disclosure of their personal information with the rights of organizations to collect, use and disclose that information for their own commercial purposes. And, while data protection laws can place limits on a company’s entitlement to use its collection of personal information in certain ways, they do not stop those companies from simultaneously treating these collections of personal data as compilations in which they hold copyright, or as confidential commercial information.

Addressing the Legal and Governance Gaps

The patchwork of laws that can be used to support claims to data ownership is poorly adapted to the big data and AI context. Yet, there are strong arguments for making data as free of constraint as possible — with the exception of personal data. The public interest in data-based innovation and in the free flow of information militates against commercial strategies that seek to restrain access to and reuse of non-personal data.

Surely, there will be adverse impacts of excessive concentrations of data in the hands of a few — competition authorities have warned against it. Uncertainty around the nature and limits of data ownership rights favours those with the deepest pockets and can dampen research and start-up innovation. This is an area where there is a need for greater clarity, and for thoughtful policy making that does not lose sight of the complex public interests at stake.

1 See e.g. NY Mercantile Exch, Inc v IntercontinentalExch, Inc, 497 F (3d) 109 (2d Cir 2006); BanxCorp v Costco Wholesale Corp, 978 F Supp (2d) 280 (SDNY 2013).

The opinions expressed in this article/multimedia are those of the author(s) and do not necessarily reflect the views of CIGI or its Board of Directors.
  • Teresa Scassa

    Teresa Scassa is a senior fellow with CIGI’s International Law Research Program. She is also the Canada Research Chair in Information Law and Policy and a full professor at the University of Ottawa’s Law Faculty, where her groundbreaking research explores issues of data ownership and control. Teresa is an award-winning scholar, and is the author and editor of five books, and over 65 peer-reviewed articles and book chapters. She has a track record of interdisciplinary collaboration to solve complex problems of law and data, and is currently part of the Geothink research partnership.