The Imperfect System Behind Digital Identities

July 10, 2018
shutterstock_1068183533-2.jpg
Escaping a data trail is an impossible dream. (Shutterstock)

Anyone who grew up in an era of paper records was warned that misdeeds and successes alike would find their way into a permanent recordan unknowable manila folder stuffed with report cards and teachers’ comments that resided in some secret filing cabinet locked away in a school closet. It was possible to imagine that if you could somehow access this file — the single copy of all the paper documenting who you were — you could remove things, purge them, alter them, and escape into the sunset as a new person.

In the digital era, escaping a data trail is an impossible dream. Every key stroke, page view and online purchase impacts how you might be targeted online, by advertising, for example. Do some research online about a toy or a vitamin, and ads on those toys and vitamins — or related competitors — shadow you doggedly across the internet. This seems harmless enough, so it’s easy to forget that all of those interactions result in data points that are being harvested, stockpiled and analyzed to create a lasting profile— a twenty-first century permanent record. As more data is collected and analyzed, each profile becomes more and more comprehensive.

But what happens when incorrect information — generated, gathered or incorrectly assigned by imperfect algorithms — or disinformation — purposefully generated to deceive or influence — becomes a part of this opaque, permanent profile?

A discussion on the subject begins with an understanding of modern information overload. Simply put, there is now too much digital information for humans to sort through and verify. This presents a significant challenge for the public and private sector alike, whether that be corporations defending their brand against a flood of defaming videos, governments looking to combat the spread of fake news or federal forces focused on online counterterrorism.

Artificial intelligence (AI) and machine learning tools are increasingly relied on to sort and curate this glut of information in a data-driven world. These tools can sift through massive amounts of information and, in theory, make related decisions. These technologies have taken on a sort of magic fairy dust quality: when sprinkled across any problem in the data-driven universe — intelligence, warfare or market valuations, among other things — machine learning and AI are supposed to yield a solution.

This notion, of course, is problematic: new technology alone doesn’t solve complicated issues, and the algorithmic opacity often associated with machine learning and AI requires people to cede a lot of sovereignty to complicated processes. There is also no guarantee that machines are better at resisting bias than the humans that program them.

“Bias is always inherent in the rules governing the machines,” says Bryan Jones, an Austin-based emerging data technologies entrepreneur and executive. “It’s very easy to add small things to a model that can have huge impacts.” This bias comes from the programmers who script the algorithms; they are often unaware that they are writing it in.

When we’re considering the efficiency of the world around us, AI seems like a useful tool — making traffic flow more smoothly, or ensuring the goods we need are manufactured and distributed as we need them. That’s not the case when it comes to considering how these same technologies apply to individuals. It’s uncomfortable to think about how personal data — everything from demographic details to online behaviours — is used to “train” algorithms, or how those same algorithms are later used to “score” and categorize people.

“More and more, closed-loop systems are used to determine the successful distribution of digital assets,” says Jones. These systems are, by nature, designed to alter how people interact with and are impacted by the online environment as it collects their data."

This concept is already actively applied in China, where the government has established a system of “social scoring” to assess — and purposefully alter — the behaviour of citizens. “Model” citizens are given additional benefits for “good” behaviour, and others are penalized and excluded from national systems — including everything from credit to dating and travel apps — for “bad” behaviour. In short, it is explicitly designed to incentivize behavioural change toward conforming to a defined standard.

“The future will be AI trying to predict and persuade humans — not just for marketing, but in politics and conflict itself,” says P. W. Singer, a fellow at New America and the author of LikeWar: The Weaponization of Social Media. “In the hands of demagogues and dictators and those without ethics, it is a scary future, indeed…We will understand less and less of the machines’ choices in this conflict over us and our behaviours. We will trick ourselves into thinking we know why we act, but won’t know a thing.”

In the United States, too, there have been notionally well-intentioned efforts to use predictive analytics to pre-empt and change human behaviour. In 2012, a data analytics firm called Palantir began partnering with the New Orleans Police Department to secretly test its predictive policing technology — which was designed to “forecast” potential perpetrators and victims of crime and intervene. Palantir negotiated free access to New Orleans policing and demographic data in exchange for the analysis of that data — all in all, a good deal for Palantir, which needs these huge, specific data sets to train its algorithms and AI. The program — which the public and much of the local government were largely unaware of until it was reported in the media — was used by Palantir to develop other forecasting models that are now sold to foreign intelligence services.

“With the combination of large data sets, user-level tracking and monitoring, and increased computational power, the ability to identify and target at a micro level is not just possible, but occurring,” says Jones. “This means that…leveraging AI optimization can occur on a granular level, revealing the tendencies and behaviour of each user.”

These technologies are designed, at their core, to model human behaviour and devise ways to alter it. If you posit these tools as intentioned to prevent and disrupt gang violence or terrorism, that doesn’t raise many hackles. But what other behaviours might people want to pre-empt? 

Google already claims it can predict when a person (in a hospital) will die, with 95 percent accuracy, using available data. Facebook, Amazon and others have been trying to buy access to health care and medical data. If predictive medical algorithms become the norm in patient screening, the impact on insurance markets and the availability of care won’t be far off.

As data is used more often to pre-score and define individuals’ opportunities — access to jobs, mortgages, credit, health care and more — there is reason to worry that bad data, including potentially purposeful disinformation, will impact individuals’ data identities.

“You can pollute — but also game — the system,” says Jones, “assuming that the rules are somewhat known." In other words, any person or machine that knows how to influence algorithmic outcomes could have a significant advantage.

There are plenty of signs that these attacks and manipulations are coming. Corporations have been tracking instances where disinformation has been used to outsmart stock-trading algorithms or to spike/collapse share prices, says Cameron Colquhoun, managing director of Neon Century, a private intelligence firm that specializes in open-source and social media analysis.

Such examples highlight a simple reality: human and machine decisions are both shaped by the information fed into the system. If the information is false or biased, then decisions made will be similarly flawed. Fake content — often designed to look real — plays a major role in this process. Citizen, corporations and politicians alike can generate fake content, share it widely and influence public opinion about anything from an individual’s reputation to an entire political campaign.

“Political forces will soon begin targeting people with individualized, AI-generated content designed to evoke a response based on their personality preferences,” says Colquhoun.

In addition to changing peoples’ minds, the disinformation generated and accelerated by new technologies can impact digital identities (the modern permanent record of sorts) — and this presents major challenges for policy makers. If law makers, like most people, have no idea how algorithms work, how can they possibly regulate their use and application? What kind of ethics and review should be in place to mitigate the cascading impacts of false information that can shape individual “scores” and, eventually, impact access to services?

Singer worries that potential solutions will be beta-tested on the world, leaving a lot of room for unfortunate impact on individuals in the process.

Right now, data is generated by so many human actions and that data hones the machine processes that could score or categorize people based on their behavior. That categorization could then shape human life — access to health care, mortgages, credit and the like. Then, more data is gathered and the cycle repeats itself. Add disinformation into the mix, and the cycle gets complicated. As disinformation becomes more widespread and sophisticated, it is likely to play a significant role in data identities — people could be poorly categorized, marginalized or discriminated against.

This complex intersection of AI, disinformation, and human identity will likely, in the near future, impact access to critical services and opportunity — as China is trying to pioneer. With this is mind, a question that Singer posed seems pertinent for both programmers and policy makers: where is the line between an algorithm predicting human potential and an algorithm determining it?

The opinions expressed in this article/multimedia are those of the author(s) and do not necessarily reflect the views of CIGI or its Board of Directors.

About the Author

Molly K. McKew (@MollyMcKew) is a writer and expert on information warfare; she currently serves as narrative architect at New Media Frontier, a social media intelligence company. As an analyst and author; her articles have appeared in Politico Magazine, Wired, the Washington Post, and other publications. She is a frequent radio/TV commentator on Russian strategy, briefs military staff and political officials on Russian doctrine and hybrid warfare, and lectures for psychological defense courses. McKew is also CEO of Fianna Strategies, a consulting firm that advises governments, political parties, and NGOs on foreign policy and strategic communication. Her recent work has focused on the European frontier — including the Baltic states, Georgia, Moldova, and Ukraine — where she has worked to counter Russian information campaigns and other elements of hybrid warfare.