Canadians are surrounded by data that impact their opinions and actions. A recent innovative study from Statistics Canada estimated the value of the stock of data, databases and data science in Canada at $217-billion in 2018. From consuming suggested content on social platforms to apps that help us manage our interactions during a pandemic, data is a constant presence. Because of its strong influence, it is important to understand the difference between good, unclear and bad data, and the impact it can have in shaping policy. It is important to be critical of the source, quality, and positioning of data and statistics in our decision-making processes.
The COVID-19 crisis is the latest illustration of the importance of statistics to guide policy and individuals’ actions. For example, differences in rates of testing, varying approaches to the cause of deaths, and how the cause of death is registered in public records, create a challenging situation to paint an accurate picture of the extent of the pandemic in many jurisdictions across the globe. As a result, public policy decisions to counter the pandemic are even more complicated.
The pandemic has exposed data gaps, timeliness gaps, and raised questions about privacy and confidentiality, trust and access to personal data by private and public organizations. But it has also demonstrated the fundamental importance of data and key aspects of good data that should always be considered: pertinence, timeliness and accuracy.
At the start of data gathering, “what to measure and why?” should always be the question.
The pandemic has illustrated that statistics and statistical systems are not static. Well before the pandemic hit, the emergence of digital and intangible economies called for new sets of statistics. The extent of the growth of the intangible economy is typically portrayed by the value of intangibles in firm-level valuations. A new set of conventions on what constitutes intangibles and how these might be measured is required.
Timeliness in data-gathering is manifest in many ways. The current pandemic has demonstrated the value of rapid gathering and dissemination of data around key aspects of COVID-19, both to develop responses and also for public information and education. The development of digital technologies is a key part of timeliness, even for traditional measures such as GDP.
Statistics Canada, along with other national statistical offices, have gone beyond their traditional roles of providing good data and analysis towards that of an active data steward and provider of micro data repositories including the infrastructure needed for access, sharing and analytics. In the case of Statistics Canada, investments in cloud technologies, data analytics as a service, real-time remote access to researchers. Virtual data labs, combined with new approaches such as web-panels and flash estimates are paying dividends in addressing data gaps and connecting challenging public policy questions with data expertise and analytics. As well, Statistics Canada is breaking new paths with the use of crowdsourcing as a flexible collection method to provide timely insights on the impacts of COVID-19 on Canadians’ mental health, their ability to finish school, their businesses as well as the impact of the pandemic on people with disabilities or from visible minority groups. This will continue to be developed and may play an important role as Canada begins to navigate toward recovery.
The development of digital technologies has also enabled national statistical offices to make better use of alternative data sources. The requirement for more granular and real-time information is leading to the increased use of sensor data: a more continuous data stream that includes scanner data, GPS positioning data (i.e. trucking loggers) and Earth observation data. Such as is the case with Statistics Canada’s AgZero project, which aims to employ satellite and AI technology to provide the agri-food community with timely, accurate and detailed data, while requiring farmers to complete fewer traditional surveys.
The phenomenon of “night-time lights” is another instance. Starting in the mid-2000s, space-based monitoring of ground illumination patterns has added speed and accuracy to conventional forms of data gathering. A recent survey of developments found that satellite-based data is more robust in measuring economic activity across a wide spectrum of countries and even provides new insights missing in traditionally gathered data. Similar possibilities are being exploited in monitoring emissions from outer space, thus providing fast, accurate and dispassionate indicators of, for example, progress towards climate change goals and commitments.
In 2016, it was estimated that bad data cost the U.S. economy US$3-trillion annually. This includes the costs of hunting for data and cleaning, organizing and correcting it, and the costs of inefficient decisions that were taken based on the bad data. Although an estimate for Canada has not been made, it would be reasonable to assume that the loss from bad data in Canada amounts to about C$400-billion (a tenth of the U.S. estimate) annually given the similarity in corporate and government processes in the two countries. To minimize errors in the use of big data, testing, estimates as “experimental series” and feedback play a key role prior to regular production.
In the context of COVID-19, while agencies and statisticians cannot control the outcome and recovery of the virus, they can and must continue to play a prominent role in helping society better understand the impacts of the current crisis, by providing good data stewardship and methodological support that allows for better tracking changes, and allow policy makers to focus on measuring the outcomes of decisions and course correct as needed.