Minding the FLoCs: Google’s Marketing Moves, AI, Privacy and the Data Commons

May 20, 2021
Screen Shot 2021-05-20 at 2.45.43 PM.png

In early 2021, Google announced they would be removing third-party cookies from the Chrome browser and moving to a more privacy-centric advertising model, sending shockwaves through the advertising, academic and tech world. Although much has been written on the transition and its technical workings — including by us — the true scale of this transition has been lost in the conversation. Google’s move represents just a fragment of a momentous shift in Alphabet’s corporate strategy.

Shoshana Zuboff showed how Google revolutionized online advertising by discovering that the “data exhaust” created by internet users as they visited websites and looked at pages could be collected, analyzed, packaged and sold to other companies that wanted to target particular people with marketing. Over almost two decades, Google has parlayed this exhaust into a dominant share in the online advertising market: their online ad revenue in 2020 was US$134.8 billion in a total global market estimated at US$325 billion in 2019, which seems a dominant position, with some estimating that 83.3 percent of the revenue of Google’s parent company, Alphabet, comes from Google’s ad business. Third-party cookies are a primary means of data acquisition, but if their time is at an end, what exactly comes next, and why?

By following the bread crumb trail of Google’s new advertising strategy, we can get a clear sense of what Alphabet believes is its future: artificial intelligence (AI). Although the global AI market is currently valued, according to an analysis by the International Data Corporation (IDC), at US$156.5 billion in 2020, well below the online ad business, the market is in its infancy and expected to grow exponentially; the IDC predicts that even when the COVID-19 pandemic’s effects on the market are accounted for, the AI market will grow to US$300 billion by 2024. Further, a report by Grand View Research predicts that the market could surpass US$700 billion by 2027, more than double the size of the current online ad market. But that may be a conservative estimate.

By following the bread crumb trail of Google’s new advertising strategy, we can get a clear sense of what Alphabet believes is its future: artificial intelligence.

Alphabet is banking on obtaining a substantial share of the AI market, which will be far more lucrative than its current dominant share of the advertising market — but this is still just the beginning. Alphabet is further interested in developing AI that will allow the company to become industry leaders in fields with which it is not commonly associated, from pharmaceuticals and health care (a market worth trillions), to self-driving transport trucks.

Google’s pivot to AI is consistent with not only with its corporate strengths but also overall market trends. Google has positioned itself as an early investor in promising technologies and was an early leader in AI development. It has consistently spearheaded breakthroughs in the field, such as the development of AlphaGo, an early indicator for AI’s future potential. Google further finds itself in an advantageous position due to its previous business goal of “organizing the world’s information,” which gives it access to vast amounts of data for AI training. Google plans to cement this dominance, and Canada (and other national governments) should prepare their response.

Tending the FLoCs

In the absence of third-party cookies, the technology that has allowed Chrome (and other browsers) to track a user’s actions across the Web, Google has introduced a technology known as Federated Learning of Cohorts, or FLoCs. FLoCs are the beginning of a shift toward a targeted advertising model wherein the advertising company does not need to amass identifiable data on a subject in order to serve effective ads. In essence, FLoCs operate via a deep learning system, a roving AI that periodically enters a user’s device through the Chrome browser when the device is plugged in, so that the user does not notice any power usage interruptions. The AI will then access the user temporal data — the most recent interactions on the browser, potentially the user’s URL history — and use the data to classify the user into a cohort (a cohort is a group of similar users identified as having similar traits and preferences). Once the AI has identified the user’s cohort, and trained itself on the user’s data (more on this below), it departs the Chrome browser, leaving behind only the user’s new cohort ID.

In this system, advertisers will no longer market to individuals who, on the basis of their individual profiles, the advertisers have determined may be interested in their product or service; instead, they will market to specific cohorts of hundreds of people who Google have determined are alike enough for specific marketing purposes. Although marketing to groups instead of individuals may seem less effective, it is already a common tactic in retail marketing, wherein consumers are targeted based on a number of “segments”; further, Google has claimed that FLoCs are 95 percent as effective as its former privacy-invasive method.

However, FloCs were not designed by Google specifically for advertising. They are the first widely visible application of a new AI-driven technique called federated learning. Federated learning is a technique that allows an AI model to train on a user’s device, eliminating the need for mass centralized data collection for AI training. Consider, for example, how an in-home assistant, such as Google Home, or Amazon’s Alexa, learns from users’ patterns of responses and provides answers that are supposedly more attuned to their desires.

If we designed our own smart home assistant — let’s call it “HAL9000” — we would first need to train it on a large, tailored data set. This original data set could be obtained in numerous ways, including by producing it in-house or purchasing a custom vocal-command AI-training data set from a company such as Lionbridge. Once the initial system was trained, HAL9000 would be sent out for a beta test and, eventually, to consumers. However, the secret to a good AI is that the more data the operating model is exposed to, the better it becomes. This means that the data being collected by our HAL9000 devices in customers’ homes would be extremely valuable to the company, as using it to train the AI models that run HAL9000 would make it exponentially better and, therefore, more competitive in the market.

Traditionally, when data is extracted, it is exported from the user’s device to a centralized location to be processed and analyzed. This transfer has been the source of most privacy concerns around online ad systems. If AI-based systems — such as assistants — are to become parts of customers’ personal lives, they — and regulators — would not be pleased by big tech constantly capturing their personal conversations with their devices. Federated learning, so it seems, offers an elegant solution.

Extending our HAL9000 example, instead of centralizing the data and training in one location, our AI training model would “hop” between all of the HAL9000 devices in the world. When the model arrived on a device, it would train on the data stored on the device, creating a new updated model of itself. The AI model would then delete the old model running the HAL9000 device and duplicate itself, leaving one version of the updated model behind while the other hopped to the next HAL9000 device to train on its data. Through this method, AI models can be trained on extremely sensitive data sets with far more data security and privacy, because the users’ data never leaves their devices. The only product that would leave a user’s HAL9000 device would be the now-encrypted new model, which, even if decrypted, would not provide any insights into the user’s personal data upon which it trained.

The security and data privacy in the federated learning model is fundamental to Alphabet’s new mission: becoming an “AI-first” company — that is, Alphabet and Google’s transition from being an advertising-funded company to a corporation whose primary focus is AI. Evidence for this shift comes both from Google’s actions and from CEO Sundar Pichai’s recent statements. Specifically, Pichai has stated in multiple investor calls that Google is now an AI-first company, and further cemented this changing corporate focus during his Google I/O address, when he stated, “We are moving from a company that helps you find answers to a company that helps you get things done.” In 2017, Pichai clarified to investors that this has always been a long-term strategy, and that although the goal of organizing the world’s information — a goal inseparable from the economic value of Google Search, Google’s original product — will remain the company’s guiding principle for the coming decade, Google has also “been laying a foundation for the next decade as we pivot to an AI first company, powering the next generation of Google products like the Google Assistant.” Alphabet’s focus on a transition to AI reveals the motivation behind another major Google investment: hardware.

Google has not typically been associated with hardware. However, in recent years it has increasingly produced physical Google products, such as Chromebooks, Pixel phones, Nest smart home products and Google Home virtual assistants. Google is funding the development of its less profitable hardware projects through its revenue from advertising as part of a longer-term strategy. Specifically, Google has been ramping up its integration of AI into its hardware, and envisions Google Assistant as a future selling point for all Google hardware. Google’s goal is to create an interface integrated across all of our devices that could interact with others as if it were a real person and could perform tasks, such as making phone calls, in your place. Undoubtedly, the producer of the devices that could support this advanced AI assistant would have a huge competitive advantage.

Here we can begin to see again the importance of federated learning to the AI-first strategy. Federated learning would allow Google to integrate the data sets across all of the devices running its AI systems, training vast scores of real-world data. However, Google has further adapted its model to extract non-identifiable data that would allow better AI training in other ways. Specifically, Google is utilizing a technology known as a web beacon. A web beacon, put simply, is a first-party cookie that acts like a third-party cookie. The best current example of a web beacon is not a Google product, but Facebook’s. When a company adds a Facebook Like button to its website, it embeds a small line of JavaScript into the code of the website. This code allows Facebook to place a first-party cookie on the website, providing access to information about the customers who visit the site. This is how Facebook advertisements become tailored to your recent internet searches even if you have third-party cookies turned off.

The catch is that for this technique to be successful at gathering mass data, it must be voluntarily installed on thousands of independent websites across the internet. Google has a way of doing exactly this: Google Analytics, a website analytics platform that can be used free of charge. The system offers substantial value to any website operator, as it provides insights into their users’ on-site behaviour. However, unbenownst to many who opt to use Google Analytics, Google obtains the ability, through web beacons, to access much of the data collected through the platform.

Web beacons were originally used for advertising and for building user portfolios across the web. However, Google has recently added an “anonymization” feature, which allows companies to withhold the complete Internet Protocol (IP) addresses of their users from Google. The feature allows the data customer to know the geographic region of its users, but not their exact physical or online locations. Removing the need for IP addresses means that web beacons would no longer be so useful in tracking individuals, which fits with the move to FLoCs. Thus, while Google is still collecting “data exhaust” and can construct vast data sets, because its web beacons use first-party cookie technology, they are not covered by regulation of the use of third-party cookies.

AI, Privacy and the Data Commons

The transition to AI makes for several important shifts in critical focus, which are perhaps long overdue. To begin with, we need to ask whether FLoCs, or any of the operations that go into making them, are covered by the more advanced existing and proposed personal data protection and privacy regulations, for example, the European Union’s General Data Protection Regulation (GDPR) or Canada’s Bill C-11. The conception of privacy associated with these laws revolves around our right not to be known at a deep personal level by unaccountable organizations and not to have our actions and beliefs manipulated through the extraction and analysis of our data. Privacy laws protect a highly individualized self through conventional legal mechanisms that are themselves focused on the individual.

The movement to AI adds a new problem, as the data collected is no longer primarily utilized to profile individuals but instead to target groups and to create general machines and applications. By using FLoCs, corporations such as Google can claim to be respecting the privacy of individual users, since they are no longer collecting and storing personal data off the users’ devices. However, they are able to leverage the same level of knowledge about the individual user and, potentially, to generate similar profits, by accepting that we are not as unique as we would like to believe, and that we can be segmented and predicted if treated as members of a social whole. The big difference from conventional marketing segmentation is that FLoCs are being constantly updated and improved through AI systems. This distinction leads to important philosophical and policy questions: Is it how an organization comes to know us that is important, or the fact that they know? And does the focus on legal maintenance of privacy matter quite as much in these circumstances?

To answer these questions is beyond this article’s scope, but we do not see the question as settled, as Alphabet appears to believe. Clearly, there are fewer direct privacy implications from the operation of cohorts once they are constructed. But, to construct them, personal data needs to be collected and processed, and it is unclear that simply doing so on the user’s device or browser means that Google can escape regulation under, say, the GDPR.

However, it may be that in deploying arguments about privacy and data protection, we could be using the wrong human-rights tools for this new situation. Among the most challenging questions that must be asked about the new AI development marketplace is the one about data usage and ownership. The metaphor that “big data is the new oil” is vastly overused and actively unhelpful in many aspects, but it might still be useful in drawing attention to the idea of data as a resource.

We believe that the individual orientation of not only data protection and privacy laws but also data privatization proposals is inadequate in dealing with data as a public good extracted from the commons.

Over the past few generations, data has been extracted from our everyday interactions — slowly, over the first 150 years or so, and then, in the past couple of decades, much more rapidly and precisely, so that our lives are now more readily transformable into data. A common current proposal, with the objective of accelerating the process of extraction — in other words, of fully subsuming data into the marketplace — is to do what was done with land in the pre-industrial revolution and colonial period, and enclose it as fully private property. According to this view, if data has value and is the basis of the new economy, it follows that users should be paid for their data. Accordingly, all data should be commodified and its value “paid” to those from whom it is extracted. Dollar signs always make for a seductive argument, but the proposal fails on three grounds. First, while massive data sets are very valuable, the specific value of any individual’s data is negligible; indeed, it only acquires value through combination. Second, this move would further reinforce existing inequalities and economies of conventional wealth and power, as well as of image: some people’s data is assigned vastly more significance, and is therefore more valuable, than others’. Finally, of course, and worst of all, this move would basically legally enshrine data exploitation as unchallengeable and just “how things are.”

We believe that the individual orientation of not only data protection and privacy laws but also data privatization proposals is inadequate in dealing with data as a public good extracted from the commons. Ironically, an understanding of data as commons, sparked by AI-based technologies such as FLoCs, means that one can imagine different ways in which mass data sets could be made available and accessible to all; unlike physical commons, data commons cannot be subject to exhaustion or over-exploitation. One can imagine multiple common-data futures, such as the civic data trusts described by Bianca Wylie and Sean McDonald, and a vast array of transformative economic effects delivered through AI.

But harnessing AI in Canada is not just about industrial policy and state investment; it is also about regulatory innovation. Partly, this is because we can’t trust a corporation as large as Alphabet to self-regulate, especially when we have abundant evidence that its internal AI ethics on race and gender are woeful, leading recently to the pushing out of two of its leading critical ethicists, Timnit Gebru and Margaret Mitchell, the resignation of a third, Samy Bengio, and widespread internal disarray. Issues with AI around data and training bias are now well understood among critics, if not acknowledged by platform corporations themselves, and the social (in)justice implications of these issues require external intervention. While Google and other major platform corporations might be able to follow the letter of the law on data protection and privacy, even in a regulation as complex as the European Union’s GDPR, this does not mean that their use of data is fair or just. In an emerging world of data commons, data justice will have to be as important as privacy as an organizing, if not legal, principle.

Regulation is also no kind of a barrier: it would create new opportunities while protecting the data commons and individual rights, putting a stop to damaging opportunism and enclosure. Two simple developments could solidify Canada’s place as an a leader in just AI and common data. First, we need comprehensive guidelines and regulations for the development and use of AI in Canada, similar to, but even more progressive than, the European Union’s proposed AI regulation. Second, we must require that all Canadian mass data, collected by a Canadian company, or by a foreign company in Canada, be treated as a common resource, accessible by any Canadian wishing to utilize the data for the development of new technologies, civic and social programs, art or, indeed, for any ethical purpose. This could spark new waves of AI innovation, both in the terms of conventional capitalist competition and in terms of civil and social innovation for the common good.

The opinions expressed in this article/multimedia are those of the author(s) and do not necessarily reflect the views of CIGI or its Board of Directors.

About the Authors

David Eliot is an M.A. student in the Department of Sociology at Queens University in Ontario and a member of the Surveillance Studies Centre.

David Murakami Wood is the former Canada Research Chair (Tier II) in Surveillance Studies and an associate professor at Queen’s University in Ontario.