DataDownload: The true colors of the political spectrum A weekly summary of all things Media, Data, Emerging Tech View this email in your browser
Welcome back. It’s fall — which feels strangely a lot like summer, but a bit chillier.
We’re gearing up for the NYC Media Lab Summit. If you haven’t grabbed a ticket, there’s a free ticket waiting for you HERE (or for $25 we’ll ship you a modest swag box and our appreciation.)
Super excited that Tristan Harris will be kicking us off with a powerful Keynote on October 7th. …
Innovation Monitor: Gartner Hype Cycle Trend #2 — Algorithmic Trust
Welcome to this week’s Innovation Monitor. Previous editions of this newsletter dug into the dangers of algorithmic bias, but what about its flip side — algorithmic trust?
As companies, governments, and the public become increasingly aware of the challenges behind the notion of an objective algorithm, many are trying to build and check solutions to create more transparency. We can thank researchers and journalists uncovering countless examples of discrimination, false positives, dataset bias, sheer carelessness, and much more in tech today. And these solutions encompass Gartner’s second major hype cycle trend — Algorithmic Trust.
We discussed this during one of NYC Media Lab’s virtual Machines+Media panels with Cathy O’Neil, who raised an important guiding question: For whom will [this tech] fail? She also warned that the use of these algorithmic tools are replacing difficult, complex conversations on everything from teacher evaluations to prison reform.
Here’s Gartner’s take on it: “Increased amounts of consumer data exposure, fake news and videos, and biased AI, have caused organizations to shift from trusting central authorities (government registrars, clearing houses) to trusting algorithms. Algorithmic trust models ensure the privacy and security of data, provenance of assets, and the identities of people and things.”
One example is authenticated provenance — “a way to authenticate assets on the blockchain and ensure they’re not fake or counterfeit.” (Check out The New York Times R&D team’s News Provenance Project.) Other emerging technologies include differential privacy, responsible AI, and explainable AI. This week, we’re diving deep into each.
Finally, if you’re looking for a documentary to watch this weekend, we’re recommending The Social Dilemma on Netflix. As always, we wish you and your community safety, calm and solidarity as we support each other through this unprecedented time. Thank you for reading!
Erica Matsumoto BLOCKCHAIN In Gartner’s definition, they stress blockchain as a technology to solidify trust in the technology sphere. The provenance and authentication blockchain tech allows for could be applied to such a wide array of use cases that it still has the potential to revolutionize the world.
There have been countless startups, bank pilot programs, academic initiatives, etc. that have tested authentication and provenance using a public or private blockchain in the past few years, and I am understanding that every reader of this newsletter has almost certainly read a great deal on the topic. One of my favorite recent pieces was from the NY Times R&D group on fighting misinformation with blockchain. But, as I mentioned in the intro, I wanted to use this newsletter to dig into the other three concepts as they’re all incredibly timely and important. DIFFERENTIAL PRIVACY While big data can help unearth invaluable patterns that can be employed in health research, identifying discrimination, and even reducing traffic, the downside is that you’re aggregating massive amounts of personal information… and we’ve seen how that can go sideways for decades. Anonymizing the data isn’t fool-proof either. In fact, we’ve known it wasn’t for years.
De-anonymization can take supposedly anonymized data and trace it back to an actual person. And while 2018’s NY Times investigation — Your Apps Know Where You Were Last Night, and They’re Not Keeping It Secret — put the dangers of so-called “anonymous” data into stark limelight, this isn’t something we’ve just discovered. Back in 2007, a few researchers from the University of Texas took anonymous data from the $1M Netflix Prize competition and traced it back to real IMDB reviewers. Does this 13-year-old paragraph sound familiar?
“Privacy worries have heightened in the past few years following a number of data breaches that have leaked sensitive information on millions of people. In November, the head of HM Revenue & Customs, the United Kingdom’s tax agency, resigned after two data discs containing sensitive, yet unencrypted, personal details of 25 million U.K. citizens were lost in the mail. In January, retail giant TJX Companies announced that data thieves had stolen the credit- and debit-card details on, what currently is estimated to be, more than 94 million consumers.”
You wouldn’t be surprised if that happened yesterday — actually, you’d likely be less surprised than 2007 you. Andrew Trask, who leads OpenMined, an open-source community that builds privacy tools for artificial intelligence, says that “just erasing a piece of someone’s fingerprint doesn’t get rid of the whole thing.” Multiple sources can help threat actors connect the dots or re-identify real-life counterparts based on seemingly anonymous data points.
Differential privacy (also known as epsilon indistinguishability) might help here. According to Built In: “Differential privacy makes data anonymous by deliberately injecting noise into a data set — in a way that still allows engineers to run all manner of useful statistical analysis, but without any personal information being identifiable.”
Like de-anonymization, this practice isn’t new. In fact, it’s been around for well over a decade. Brookings links it to a 2006 research paper — Calibrating Noise to Sensitivity in Private Data Analysis.
For 2020’s census, the Census Bureau will incorporate differential privacy as it collects population data. Google used it this year too for its mobility reports, which reported on population movement patterns during the pandemic (also see the company’s differential privacy repo). Apple uses the technique to analyze user data.
The practice can enable privacy, but it also does something far more effective: it incentivizes privacy. “If the data is cloaked so that no one can pick out an individual, it can be shared — and therefore analyzed and monetized — around the globe, even if it’s “going” to a place with stringent privacy regulations,” says Built In.
Still, Brookings notes, differential privacy has its drawbacks (besides decreased accuracy due to noise injection). It requires resources and a large dataset, and there is concern that organizations might be exaggerating how much privacy they’re providing. Cynthia Dwork and researchers at UC Berkeley have proposed an Epsilon Registry in response: “a publicly available communal body of knowledge about differential privacy implementations that can be used by various stakeholders to drive the identification and adoption of judicious differentially private implementations.” Read more about the idea here. RESPONSIBLE AI Wait, you might be thinking. Isn’t Explaining AI, in a sense, responsible? And how can AI be responsible? Hasn’t the notion that an algorithm can both be the scapegoat and the solution to machine-caused bias been thoroughly quashed by researchers and investigative reporters? Ok so let’s back up. What’s the difference between explainable AI and responsible AI? Futurist Anand Tamboli wrote this nice explainer in a Medium post:
“Think of an air crash investigation; it is a classic example with which we can compare explainable AI. In the air crash investigation, when something goes wrong, say there was an accident. You first find the Black Box, open it, analyze it, and go through the whole sequence of operations. Then understand what happened, why it happened, and how you can prevent it next time. But that is the post-facto operation, a postmortem. You are not avoiding the incident in the first place.
As a responsible approach, you train your pilots, your crew to avoid these kinds of mishaps. You build your operations in such a way that it prevents these accidents from happening. When it is explainable AI, it is post-facto. It is necessary as an after-the-fact. But when it comes to responsibility AI, it is essential to prevent mishaps from happening.” …
DataDownload: A skeptical democracy A weekly summary of all things Media, Data, Emerging Tech View this email in your browser
This week I think the articles we’ve curated speak for themselves.
This Pew Research survey is important. Very. Ev Williams update on how Medium is changing is important and rare (ok, pun intended). Stanford’s Cable TV News Analyzer is critically valuable. CNN’s fact-checker is barely keeping up.
Plus, On The Media podcast — with our pal Bob Garfield and Brook Gladstone is our must-listen for this week. …