Now More than Ever, the World Needs Data Stewards

October 1, 2020
2020-09-28T114354Z_169671063_RC2N7J9CCK2F_RTRMADP_3_HEALTH-CORONAVIRUS-SPAIN (1).JPG
(Reuters/Sergio Perez)

Cet essai est disponible en français.

This article is a part of a Statistics Canada and CIGI collaboration to discuss data needs for a changing world.


The coronavirus (COVID-19) crisis is the latest illustration of the importance of statistics to guide policy and individuals’ actions. Because of differences in testing rates and how cause of death is registered in public records, the extent of the pandemic is unknown, making actions to counter it complicated. For example, Sub-Saharan Africa did not record its first case of COVID-19 until late February, and even after that, its caseload lagged behind other regions of the world. This might have been due to the genuinely slow arrival and spread of the virus; to “co-immunity” within the population (because of exposure to other microbes whose interaction with COVID-19 is not currently understood); or to the paucity of widespread testing, and therefore data, on the continent.

As another example, there is a belief among US intelligence analysts that the extent of the pandemic and the resulting deaths in China were understated by the authorities. Varying approaches to identifying cause of death during the pandemic have also resulted in disparate fatality rates attributable to the virus, leading to undercounting in some US states and overcounting in others. And, because people can be infected but asymptomatic, the actual rate of infection could be higher than the number of confirmed cases.

Without detailed and comparable statistics — and, furthermore, widespread political willingness, institutional capacity and other capabilities to gather, disseminate and make sense of them — we will never be able to distinguish fact from fiction or coincidence from causation.

Data, Both Grand and Granular

At the onset of the pandemic, policy makers in many countries — Canada included — took bold actions to prioritize public health, reduce or eliminate people’s mobility, and provide financial support to individuals and businesses. The social and economic impacts of these necessary actions were swift and severe. 

Statisticians’ tools — surveys and administrative data — showed the differential impact of COVID-19 on specific segments of the population and on certain sectors, as well as the emerging social impacts of the shutdown. In addition to drawing on the instruments that provide key socio-economic indicators, statisticians supplemented their toolbox with a number of innovative methods, such as crowdsourcing, web panels, web scraping, and mobility and satellite data, to provide a picture of not only the widespread but also the targeted impacts of the virus and the ensuing policy decisions in near real time. 

These innovations — flash estimates of GDP and trade; macro-level estimates of mobility (people and goods); adjustments to concepts such as “unemployed”; and self-reported experiences of the population in areas as diverse as mental health, anxiety and stress, changes in savings and consumption patterns, financial hardship, education, and trust in science and public authorities, to name a few — have been helpful to policy makers in charting the course toward some form of recovery. Statisticians are leveraging common international standards and classifications to provide an integrated picture of different national experiences with the virus, related policies, and their impacts on societies and economies.

Foundations of Strong Data Stewardship

The pandemic has also exposed data gaps (for some subpopulations and geographic regions) and timeliness gaps, and it has raised questions about privacy, confidentiality, trust and access to personal data by private and public organizations, along with their associated legal frameworks. But it has also demonstrated the fundamental importance of data. Good statistical rigour in determining prevalence and propagation in all their spatial and temporal dimensions forms the basis of the models that are used to guide decision making, be it in normal times or extraordinary ones.

The foundations of current economic governance were laid in the 1940s, particularly in the conferences held in Bretton Woods (1944) and San Francisco (1945), which established the major international economic and social development institutions that function to this day. For these institutions — the International Monetary Fund, the World Bank, and the United Nations and its agencies — all of which rely on global cooperation to achieve their mandates, the first requirement has always been good data. They need reliable, timely and pertinent numbers that enable sound decision making and comparisons across jurisdictions and over time.

Less well known than the meetings in Bretton Woods and San Francisco (but at least as important) was a gathering in New Jersey in late December 1947. Under the leadership of Richard Stone, the Sub-Committee on National Income Statistics of the League of Nations Committee of Statistical Experts met in Princeton and completed Measurement of National Income and the Construction of Social Accounts, a report that subsequently led to the establishment of the United Nations Statistical Commission (UNSC). The System of National Accounts (SNA) that the UNSC introduced has since served the world well, for standardized ways of measuring things are a true global public good, accessible to all (that is, non-excludable and non-rivalrous), and underproduced if left to the market alone. The SNA structure fundamentally changed the nature of economics from a deductive social science to one based on empirical observation and measurement.

Statistics Canada’s evolution is intertwined with statistical developments on the international scene and the development of the statistical frameworks that ensure coherent and comparable statistics. Over the years since its establishment in 1918, the Dominion Bureau of Statistics (DBS) has kept on improving its measurement of economic activities. By the end of the 1940s, it had started producing a set of balancing accounts. It followed this work in the 1950s by producing estimates of the flow of goods and services. By that time, Canada was significantly involved internationally, having been named chair of the UNSC when that body was created in 1947. During the 1960s, the DBS started to use seasonal adjustments for its quarterly accounts and made great strides in improving the classification of goods and products. This was also the period when social statistics became more refined, with the modernization of the employment definition in the Labour Force Survey. In 1963, the International Statistical Institute held its World Statistics Congress in Ottawa, and the DBS played an important role in the conference and its program. With demands for more granular data came the development of enhanced confidentiality practices.

Beyond timeliness and accuracy, the relevance of data is fundamental. What to measure and why are major preoccupations of national statistical offices around the world.

In 1971, the DBS became Statistics Canada. By then, Canada was a mature country in terms of statistics. Multiple developments in the field of national accounting brought Canada to the leading edge, and the head of national accounts was named director of the United Nations Statistics Division in 1972. Over time, Canada became a first-level player in the international statistical world by constantly developing new methods to measure socio-economic activity. As part of the United Nations, Canada was instrumental in creating a series of focused groups (called city groups) to concentrate research and developments on how to measure new phenomena.

In the 1990s, Statistics Canada significantly developed the social statistics portfolio, with, among other programs, longitudinal surveys related to income dynamics and children and youth. The agency also worked hard to develop analytical capacity for economic, social and health domains. Beyond its borders, Canada was an important influencer, advocating successfully for the creation of the position of chief statistician at the Organisation for Economic Co-operation and Development (OECD). As well, during this decade, Statistics Canada was twice ranked as the leading statistical agency of the world. It was also during these years that the agency undertook a major project to improve provincial and economic statistics. This project was created to provide estimates of provincial shares related to the new harmonized sales tax. It overhauled how economic statistics were produced and also impacted some social statistics programs.

Statistics Canada was one of the first statistical organizations to provide its products through a website (1996) and an early adopter of online census collection (2006) and administration. As changes in the socio-economic context continued to take place, Statistics Canada kept on innovating. Three years ago, the agency embarked on an ambitious modernization agenda founded on five pillars: user-centric service delivery, leading-edge methods and data integration, sharing and collaboration, statistical capacity-building and leadership, and a modern workforce and flexible workplace. Advances related to each of the pillars have allowed the agency to continue to provide data and analysis necessary for the recovery phase of the pandemic. More recently, in response to the pandemic, Statistics Canada has developed new tools, such as the Canadian Economic Dashboard and COVID-19, the Canadian Statistical Geospatial Explorer, and preliminary data on the number of confirmed COVID-19 cases in Canada, to facilitate quick access to a variety of information. The agency also heavily influenced the data strategy for the federal public service, acting as a data steward to build greater data-management capacity, strengthen the underlying infrastructure and governance, build greater capacity and expertise, and increase the value of data as an asset for Canadians. It works closely with entities such as the Standards Council of Canada on the newly formed data governance collaborative, the CIO Strategy Council (a forum for chief information officers), and the Centre for International Governance Innovation — to name a few — in the pursuit of a responsible data-driven society.

Timely Data Is Essential

Timeliness in data collection is important in many ways. The current pandemic has demonstrated the value of rapidly gathering and disseminating data on key aspects of COVID-19, both to develop responses and to provide public information and education.  

National statistical offices have been transforming to better use alternative data sources. For example, the requirement for more granular and real-time information is leading to the increased use of sensor data — that is, a more continuous data stream that includes scanner data, GPS data (for example, of truck movements) and earth observation data (including from satellites). Crowdsourcing is being used as a flexible collection method to provide timely insights on the impacts of COVID-19 on Canadians’ mental health, their ability to finish school, their businesses, and more. These alternative methods will continue to be developed and may play an important role as Canada begins to navigate toward recovery.

The phenomenon of “nighttime lights” is one instance of an alternative data source improving timeliness. Starting in the mid-2000s, space-based monitoring of ground illumination patterns as a proxy for growth in GDP measures has added speed and accuracy to conventional forms of data gathering, particularly in countries where human capacity and infrastructure are weak. A recent survey of developments in this area found that satellite-based data is robust compared with more conventional approaches to measuring economic activity across a wide spectrum of countries. They even provide insights not available through traditionally gathered data. Similarly, satellite monitoring of greenhouse gas emissions is being explored to assess its potential to provide fast, accurate and dispassionate indicators of progress toward climate change goals and commitments, for example.

The pandemic has challenged Statistics Canada and other national statistical offices to go beyond their traditional roles of providing good data and analysis, and toward the roles of a more active data steward and a provider of both data and microdata repositories, which require infrastructure for access, sharing and analytics. Modest investments made in cloud technologies, data analytics as a service, real-time remote access to researchers and virtual data labs are paying dividends in connecting challenging policy questions to data analytics expertise. 

Data Must Be Accurate

In 2016, it was estimated that bad data cost the US economy US$3 trillion annually. This figure includes the costs of hunting for, cleaning, organizing and correcting data, as well as the costs of inefficient decisions taken on the basis of bad data. An estimate for Canada has not been made, but it would be reasonable to assume that, given the similarity in corporate and government processes between the two countries, the loss from bad data in Canada amounts annually to about one-tenth of the US estimate, or CDN$400 billion.

To minimize errors in the official statistics it produces, Statistics Canada has developed and uses rigorous data quality guidelines, which have recently been expanded to cover non-survey sources such as big data. The agency makes a considerable effort to test its numbers before publication, release initial estimates as “experimental series” and proactively seek feedback before moving a series into regular production.

For instance, when Statistics Canada was moving to satellite-based estimates of crop yield, its evaluation included “ground-truthing” the earth observation data, which entailed direct on-the-ground verification of crop types. In addition, the model was validated by running the traditional survey and the modelled (satellite-based) series in parallel. Once Statistics Canada was satisfied that these models worked as planned, it sought feedback from key stakeholders. Only after it was comfortable with the increased quality of the modelled data did Statistics Canada cancel the relevant respondent-completed survey and move to a satellite-model approach.

The Importance of Relevant Data

Beyond timeliness and accuracy, the relevance of data is fundamental. What to measure and why are major preoccupations of national statistical offices around the world. As described above, the evolution of the statistical system began with a need for the drivers of economic activity at the time. In 1953, statisticians were asking:

  • What part of the total product of an economy is devoted to consumption as opposed to capital formation?
  • To what extent is the economy dependent on foreign trade?
  • What part does foreign aid play in providing the goods and services absorbed by the economy?
  • What are the relative amounts of production originating in various industrial sectors, such as agriculture, manufacturing and trade?
  • How do different parts of the economy make saving available for capital development?

Society and the economy have evolved considerably since the 1950s, and so have international statistical frameworks. However, the current pandemic has highlighted that many data gaps still exist and that official statistical frameworks need to evolve at a faster pace to match the fundamental changes occurring in society and the economy.

The emergence of the digital and intangible economy is a strong example. As the pandemic is showing, the impact of the digital economy is more apparent than ever, as physical distancing has affected all facets of human interaction. To understand the impacts of the digital economy, new sets of statistics to interpret the economy and society are needed. For example, are firms and countries using common (or even comparable) definitions of intangible products? Are they measuring the various components of intangibles? The extent of the growth of the intangible economy is typically portrayed by the value of intangibles in firm-level valuations.

Components of S&P 500 Market Value

Now More than Ever, the World Needs Data Stewards_GRAPH.png
Source: Ocean Tomo (2020).

A new set of conventions on what constitutes intangibles and how these might be measured is called for. Given the fast pace of change in this area, and its important socio-economic impacts, there is a need to increase the frequency of surveys and to disaggregate statistics into their key socio-economic elements.

The emergence of so-called big data, which drives artificial intelligence systems and is itself considered a commercial asset, has also put demands on a new set of conventions on data. One of the more fraught areas in international negotiations is data localization, the requirement that the data of a given country’s citizens be stored on servers located within that country’s borders. As countries cement their positions on the importance of their citizens’ data, the global community is likely to require a convention that defines data classes, thus facilitating a discussion about which data must be localized and which need not be localized. Statistics Canada is at the forefront of the international work on addressing measurement issues in these areas and will continue its pioneering work in measuring the value of data and the impact of digitalization on Canadian society and the Canadian economy.

Recent events have shone a light on significant underlying social issues in Canada and around the world. Racial injustice and economic inequality have become flashpoints of protest and civil disobedience. The rise of movements seeking racial and social equity in response to historical injustices has clearly shown the need for all levels of governments to acknowledge and address these inequities. To do so will require a consistent approach to understanding the barriers preventing the full and equal participation of marginalized populations, such as Indigenous peoples and racialized communities, in civil society. The COVID-19 pandemic has further exacerbated such barriers. As the pandemic unfolded and governments responded to protect their citizens against economic hardship and social injustice, it became more apparent that marginalized communities were not being protected in the same way as the general population. In Canada, this manifested itself in many different ways, including limited access to health care, financial benefits and child care, which affected the mental, physical and financial well-being of vulnerable populations. The pandemic also exposed significant gaps in knowledge about the experiences of members of these communities and the impacts of these experiences on their daily life. These information gaps are directly rooted in a lack of high-quality data for these communities. This lack has made responding to these issues very difficult. Robust and trusted data is necessary for targeting policy solutions that address the effects of these issues, but also the root causes.

Statistics Canada has always attached great importance to producing high-quality estimates that can lead to effective decision making and keep Canadians well informed. That is why Statistics Canada has made sure over the years to attract, train and nurture experts. It has also been involved in research and development activities with the aim of ensuring that statistical activities keep pace with societal changes. Furthermore, international cooperation has always been a significant dimension of this work, so that Canada’s experts can exchange ideas with colleagues from other countries and keep abreast of the best methods and approaches to measure socio-economic activities.

Conclusion

The experiences faced by societies and economies — as seen through a statistical lens — demonstrate in equal parts human resilience and human fragility, while also raising issues of social equity and justice. Statistics Canada, arguably the best statistical agency in the world, has a unique vantage point, both domestically and internationally. As a bureau member of the UNSC, the chair of the UNSC Friends of the Chair Group on Economic Statistics, the vice chair of the United Nations Economic Commission for Europe and the Conference of European Statisticians, the chair of the High-level Group on the Modernization of Official Statistics, and the chair of the OECD’s Committee on Statistics and Statistical Policy, Canada punches well above its weight in influencing concepts, classifications, methods, definitions, and technical and operational aspects of statistics in social, economic and environmental domains. Statistics Canada’s ability to see the macro-level picture and context, as well as the relationships between these domains, brings greater value to the data and analysis that Canadians need during this time.

The expertise of statisticians is a key asset in the current context. When data is proliferating at an unprecedented rate, and when decisions based on that data could have life or death consequences, the quality of data — not the quantity — is the defining factor when it comes to having confidence in analysis. Statistical rigour, transparency and good data-management practices that protect privacy are the underpinnings of new and emerging applications that enable contact tracing and of testing strategies that allow us to better understand the rate of prevalence, spread and immunity. Good statistical sampling techniques will tell us more and better inform policy decisions than mass (and often biased) testing methodologies. 

While statisticians cannot control COVID-19 or guarantee good policy decisions that will lead to a healthy recovery, they can and must continue to play a prominent role in increasing society’s understanding of the impacts of the current crisis. They can do so by providing good data stewardship and methodological support and by allowing policy makers to focus on measuring the outcomes of decisions and correcting course as needed. For over a century, Statistics Canada has been with Canadians through challenging times, always playing a prominent and proactive role as a trusted, independent and credible data steward.

The opinions expressed in this article/multimedia are those of the author(s) and do not necessarily reflect the views of CIGI or its Board of Directors.