Statistics Canada Should Be Central to a National Data Reuse Framework

April 29, 2021
Tunney's_Pasture.jpg
Tunney's Pasture, the home of Statistics Canada, in Ottawa. (Rob Kelk/Wikimedia Commons)

This article is a part of a Statistics Canada and CIGI collaboration to discuss data needs for a changing world.

Canada is lagging in the global race to assert sovereignty over data. Although governments are making progress in protecting consumers against data misuse, Canada is falling behind in managing the risks associated with data sharing or reuse. Data scientists and statisticians cannot train algorithms without large inputs of data, but in Canada they face a significant hurdle because the practices and the tools we have today don’t address this need for reuse. One of the most important actions the federal government can take to enable data sharing and manage the associated risks is to build a national framework for data reuse. Statistics Canada should be central to this framework as it already has the expertise and the infrastructure for data storage, labelling, access and controls at scale.

Reduce Misuse First, Then Strategize for Safe Reuse

Several initiatives are already under way in Canada to reduce data misuse:

  • National programs are being developed to protect citizens, consumers and patients through a new digital identity framework.
  • Work is slated to begin on national standards for open banking at the Chief Information Officers Strategy Council.
  • The federal government is implementing its federal data strategy to provide online services to Canadians.
  • The government recently tabled a Digital Charter, giving individual Canadians the power to manage personal data collected by organizations, small- and medium-sized enterprises (SMEs) and big tech platforms alike. The government has also promised to review the Statistics Act, as part of commitments made under the Digital Charter.
  • A new version of the Privacy Act is being contemplated to modernize privacy protections and to help manage data sets held by federal departments and agencies;
  • A new regulatory framework to monitor hate speech on online platforms will soon be tabled in the House of Commons.
  • Finally, major investments have been announced to introduce broadband internet to rural regions of the country.

Taken together, these wide-ranging initiatives against data misuse will give citizens the means to exercise their rights, thereby helping to level the playing field between citizens and organizations and big tech platforms.

The next step is for Canada to turn its attention to treating data as a strategic asset, which entails creating a strategy for data reuse. Data reuse is required to help solve public policy problems in sectors such as education and health care. Making data available for reuse is an essential part of an industrial strategy and to boosting innovation and competitiveness in all sectors of the economy. Data reuse makes sense from an economic and efficiency perspective — with the right framework in place, it will be cheaper and more efficient to recycle existing data sets than to create them from scratch. Industry and thought leaders have been calling for made-in-Canada sectoral strategies to spur data sharing between organizations.

What Other Jurisdictions Are Doing

In the United States, big tech platforms continue to expand their global reach under a laissez-faire, unfettered, winner-takes-all approach. Big tech platforms have acquired a daunting level of market power, thanks to a business model based on the ownership and control of data generated by billions of users. China, through its artificial intelligence (AI) strategy and the largest investment of public resources in AI research and commercialization in the world, also aims to become a global AI superpower. China’s AI strategy, its recently released intellectual property (IP) strategy, its upcoming China Standards 2035 plan  and its quantum research work will become the building blocks of a seamless and centrally controlled infrastructure for data reuse. China is also capturing data from outside its borders via its Belt and Road Initiative.

The United Kingdom and the European Union have recently launched responses to ward off US and Chinese interests from corralling and hoarding astronomical troves of data to feed their own algorithms and machine-learning tools. These comprehensive, government-led initiatives to build frameworks for data reuse may provide useful insights as Canada is pondering how to approach data reuse in a way that both respects individual rights and fosters Canada’s democratic institutions.

In September 2020, the UK government unveiled its draft National Data Strategy. The strategy sets goals for the government, industry and the non-profit sector as the country transitions to a digital society and economy. It aims to unlock the value of data by setting the correct conditions to make data usable, accessible and available across the economy, while protecting people’s data rights and private enterprises’ intellectual property. The strategy recognizes that for data to have the most effective impact, it needs to be appropriately collected, accessible, mobile and reusable. That means encouraging better coordination; enabling access to and sharing of data of appropriate quality between organizations in the public sector, private sector and third sector; and ensuring appropriate protections for the flow of data internationally. The government is expected to submit a five-year implementation plan in 2021, which will provide more specifics on a framework for data reuse.

The European Union recently unveiled its approach to master sovereign data. Early in 2020, it released its European Strategy for Data. Recognizing that data can transform all sectors of the economy and is crucial for AI, it proposed the creation of a common European data space and a single market for data, where it can flow within the European Union and across sectors. This is needed because there is currently not enough data available for reuse to train algorithms. The strategy proposes to build new European data processing and storage solutions, along with comprehensive data governance approaches to increase data sharing among companies, and to make more data available overall.

The strategy is to be deployed through four pillars:

  • a cross-sectoral governance framework for data access and use;
  • a high-impact project focused on creating European data spaces/federated cloud infrastructures;
  • competencies (including dedicated capacity building for SMEs); and
  • the rollout of common European data spaces in crucial economic sectors and domains of public interest.

Common European data spaces will ensure that more data becomes available for use in the economy and society, while keeping companies and individuals who generate the data in control (Figure 1). These spaces will be created to support data sharing in crucial sectors, including health, the environment, energy, agriculture, mobility, finance, manufacturing, public administration and skills. The European Union will be investing between €4 and €6 billion to develop data-processing infrastructures, data-sharing tools, architectures and governance mechanisms, to foster data sharing and to federate energy-efficient and trustworthy cloud infrastructures and related services.

Figure 1: Common European Data Spaces

CommonEuropeanDataSpaces_figure-01.png
Source: Directorate-General for Communications Networks, Content and Technology, European Commission. Note: GDPR = General Data Protection Regulation.

These initiatives are part of a broader effort to wrest digital influence from tech platforms in the United States, and from China as it expands the reach of its telecommunications offerings and big tech platforms. “The battlefield for industrial data is starting now,” Thierry Breton, European commissioner for the internal market, said when the legislation was proposed, adding, “While being an open continent, we are not naive.” Under the new sharing mechanism, industrial and government data used by industry could be exported overseas, but companies would need to ensure they are processed with the same protections as required within Europe. According to a report in the Wall Street Journal, “Officials don’t rule out future regulations to limit some exports in certain sensitive sectors” in order to maintain data sovereignty.

A Canadian Framework for Data Reuse

Canada’s response to foreign data harvesting needs to go beyond applying restrictions on data misuse. It also requires asserting sovereignty, control and ownership of data collected from public, private and industrial sources to foster data reuse. This response can only be achieved through the development of a coherent framework composed of several inter-related initiatives:

  • Sectoral data strategies to kick-start Canada’s economic recovery post COVID-19. They have been called for by industry and thought leaders from manufacturing, agri-food, natural resources, bioscience and digital industries. The same calls for sector-specific national data strategies have been made by thought leaders in sectors delivering public goods, such as health care, public health, education and smart cities.
  • Standardized access to data from governments to kick-start the creation of comprehensive data sets covering various sectors. The federal government recognizes that data is an asset. In its 2018 Data Strategy Roadmap, it proposes actions to ensure that it collects the data it needs to support policy, programming and regulatory objectives. It also recognizes the importance of ensuring that government-held data can be combined with data from other sources so that Canadians can unlock its value.
  • Data value chains, that is, new constructs stringing together data from multiple points across existing supply chains to link data collection and labelling, data storage and access, and data analytics activities. Data value chains will spur data sharing between organizations and the implementation of sectoral data strategies.
  • New professional classes to support specific segments of data value chains. In addition to data scientists focusing on the development and training of AI and machine-learning tools, Canada needs to train data engineers to focus on data collection and labelling, as well as data controllers to manage data access rights and sharing centres. To ensure that valid conclusions are drawn from data and to benefit from solidly established scientific frameworks, organizations may need to turn to statisticians to obtain insights from the data.
  • A standardized data governance rulebook to help organizations share and reuse data across data value chains. Standardized guidance is needed to handle cross-cutting issues such as data ownership, IP, copyright and tracking, data residency requirements, privacy and ethics. This rulebook is a prerequisite for upstream firms’ sharing of data with downstream AI-specialized firms.
  • Interoperability standards to properly frame data collection, sharing, access and analytics activities and to allow for data sharing across sectors.
  • A fifth-generation (5G) safety code to set rules and performance requirements for the emerging infrastructure underpinning data collection, transmission and storage. This new infrastructure will be made up of billions of Internet of Things (IoT) devices and 5G networks connecting hundreds of thousands of antennas affixed to buildings, roads and other infrastructure. The infrastructure needs to be safe for users and workers, and it needs to be secure.
  • An international data free-trade zone to allow for data sharing between like-minded countries. Canada’s data reuse framework should encourage international data collaboratives while asserting its sovereignty of national data.

A Focal Point for Data Reuse

One of the most important actions the federal government can take to enable data sharing is to build a data-sharing infrastructure, not unlike what the European Union is planning to undertake through its data strategy. Investments will need to be made in creating common data spaces serving specific sectors in order to manage access rights and generate much-needed trust among participants.

One option that should be explored is to entrust Statistics Canada with the mandate to establish and run data-sharing facilities in support of key sectors of the Canadian economy. In such a scenario, participants wishing to share data for reuse across data value chains would sign data-sharing agreements with Statistics Canada, which would act as the data controller for the data sets slated for data reuse. Through this arrangement, AI and other firms would be able to access relevant and necessary data sets from various sectors through one venue. Access rights and limitations could be managed by using a user credentials access system.

Taking on these responsibilities appears consistent with the Statistics Act, where Statistics Canada has the duty to “collect, compile, analyse, abstract and publish statistical information relating to the commercial, industrial, financial, social, economic and general activities and condition of the people.” It also has the duty to “promote and develop integrated social and economic statistics pertaining to the whole of Canada and to each of the provinces thereof and to coordinate plans for the integration of those statistics.”

In the data-driven economic era, data has never been more valuable, and the data that resides within firms represents a valuable asset — for firms, for innovation and for the public good — yet, at this point, most data is scattered and not treated as an asset, leaving a valuable resource untapped. Volumes and varieties of data are necessary for technologies such as AI, and the collection of data that is residing in firms could be enormously beneficial for the development of data analytics that would boost innovation and competitiveness in all sectors of the economy. Indeed, as explored in a recent article by Chantal Bernier, this point is featured in the proposed changes to federal personal privacy legislation in Bill C-11, which has provisions for the sharing of data for “socially beneficial purposes,” although it limits this sharing only to specified public institutions or an organization mandated by such an institution.

In many respects, taking on this role would be a continuation of the services that Statistics Canada already provides and is an area where it has world-renowned expertise. It would take the data that firms have agreed to share via agreements, aggregate it and make it available publicly under conditions set out in the contract. Statistics Canada is currently managing data collaboratives, for example, the Canadian Research Data Centres network, with its Microdata Access Portal, to provide access to social, economic and health data (such as publicly accessible microdata files). It could also make de-identified microdata available to firms and individuals, subject to safeguards as set out in the contracts.

Statistics Canada has a number of well-established safeguards in place:

  • By law, Statistics Canada cannot hand over anyone’s personal information — not to the police, the Royal Canadian Mounted Police, the Canada Revenue Agency or even the courts.
  • Final results are carefully screened before release to prevent published statistics from being used to derive personal information.
  • The Statistics Act contains very strict confidentiality provisions that protect collected information from unauthorized access:
    • Statistics Canada uses state-of-the-art tools, software and processes that prevent disclosure and ensure the confidentiality and privacy of individual data;
    • Statistics Canada does not share personal information with other organizations, unless consent is given; and
    • Statistics Canada employees are responsible for ensuring the security of confidential information.
  • Statistics Canada has a long-established experience of data stewardship and is internationally recognized as being a world leader in multiple aspects of data issues and data techniques.
  • Statistics Canada has developed and used proven directives, guidelines and frameworks in matters of data quality, collection, ethics, privacy, confidentiality and transparency.

To carry out data collection and dissemination requires substantial expertise. A sizable proportion of Statistics Canada staff is already engaged in data labelling, cataloguing, storing and access control functions, which are at the core of a data-sharing commons. In addition, Statistics Canada has embarked on a transition to become an active data steward. It is investing in the infrastructure needed to access, share and generate insights from data, including cloud technologies and real-time remote access to third-party users. Statistics Canada is conducting pilot projects to use alternative data sources, such as IoT sensor data, scanner data, Global Positioning System data, Earth observation data and crowdsourcing, as Anil Arora and Rohinton P. Medhora have described. These new data sources are expected to play a critical role in addressing issues identified by sectoral data strategies.

In summary, Statistics Canada appears to have the necessary skills, protocols and experience to run the data commons. Moreover, Statistics Canada is the only existing organization that could, realistically, quickly set up the commons, which will be essential to help drive innovation in Canada and to keep Canada from falling further behind in the collection and use of big data.

There are some potential issues that may need to be addressed:

  • Statistics Canada’s enabling legislation may have to be amended to allow for the creation of a data-sharing commons that can be accessed by third parties. The data envisaged here is akin to the term information. It would be a combination of structured and unstructured data. Changes may be required to deal with “big” data collected from or by firms, since Statistics Canada can only mandate the collection of existing data records, which implicitly assumes some structure to the data. To the extent that such issues interfere with existing data collection responsibilities, consideration could be given to the creation of a Data Commons Centre in the same spirit as the Research Data Centre program.
  • Currently, under existing Statistics Canada guidelines, microdata is only available to vetted researchers. The types of occupations that would need access would likely need to be broadened, since an important goal is to allow the data to be used by firms to get the benefits of big data sets. One option would be to make data available to data stewards in firms who would be tasked to ensure that guidelines for data use are respected and enforced. In addition, the purpose of the proposal is the voluntary sharing of data among firms that participate in the commons. As data analytics become more widespread, it is likely that the demand to use the data may also become more widespread, including from firms that do not have data or those that may not be part of the commons, for example, data analytics firms that could provide services to firms in the commons.
  • The data commons envisaged in this article would see the data ultimately reside at Statistics Canada, but this does not have to be the case. Data could remain with firms but be managed by Statistics Canada, with data-sharing arrangements among firms. The technology exists for the secure sharing of data between firms and Statistics Canada. New technologies to encrypt information and perform calculations on data by third parties without having decrypted data (also called “homomorphic encryption”) could be considered.
  • As the value of large data sets becomes more obvious, there may be additional demands for mandatory compilation of such data. Similarly, as the regulatory frameworks adapt to the digital economy, the need for new and different types of data will inevitably become necessary. For example, one option being considered by the Canadian government is a regulator for social media platforms to deal with, among other things, transparency of their operations. The regulator would require substantial amounts of information from the social media platforms. Statistics Canada could be the designated body to collect this information, a recommendation also recently made by the Canadian Commission on Democratic Expression. Given Statistics Canada’s world-renowned expertise in standard setting, it could also help to define the standards for the collection of such data.

Conclusion

Canada will benefit by taking an integrated approach to the development of a national framework for data reuse that respects individual rights and fosters democratic institutions and values. The framework should rely on a suite of measures that protect citizens from data abuse, national sectoral data strategies, and a 5G infrastructure that is safe and secure.

A national framework for data reuse should be seen as a nation-building initiative. For voluntary data reuse to work in Canada, institutions will need to design a framework that reinforces social solidarity and democratic values. Implementing it will require vision, commitment and resources. The federal government has a unique opportunity to take a leadership role and establish a balanced framework to facilitate data sharing. Data integration is needed to support open-data policies and to feed algorithms and machine-learning tools. AI needs large quantities of data to generate new insights, and more data can also reduce the potential for unintended biases. Data sharing between federal departments and agencies can help meet public needs, improve the delivery of public services and result in more informed decisions.

In order to generate new insights to help solve problems of public interest, governments, including provincial and territorial governments as well as other interested stakeholders, will need to work together. Sharing data controlled by governments through a clear access rights process is key to success. A modernized Privacy Act should manage privacy not by limiting data collection, but by encouraging data sharing and reuse while closely managing access rights. With that in mind, there should be a focus on establishing an appropriate framework for managing data access.

The federal government should seriously consider entrusting the data-sharing commons to Statistics Canada as a way to kick-start the creation of Canada’s data reuse framework. Creating the framework may also require new legislation, regulations, funding and program management, as well as coordination across not only federal departments and agencies, provinces, territories and Aboriginal governments, but with key sectors of the economy.

The opinions expressed in this article/multimedia are those of the author(s) and do not necessarily reflect the views of CIGI or its Board of Directors.

About the Authors

Robert (Bob) Fay is a CIGI senior fellow and an expert in the field of digital economy research.

Michel Girard is a CIGI senior fellow and an expert in the area of standards for big data and artificial intelligence.