DataDownload: How our weird shopping behavior is messing with AI models

8 min readMay 16, 2020

DataDownload: How our weird shopping behavior is messing with AI models A weekly summary of all things Media, Data, Emerging Tech View this email in your browser

AI is confused by the pandemic — but who isn’t. Testing is a freedom of information issue — says CJR. We explore the future of 5G with the Director of Verizon’s 5G Lab. And as events all go virtual, our Events listing captures some of the coolest.

Meanwhile, if you’re not wearing a mask you’re choosing luck over science. And since you’re reading this newsletter, that can’t be true. Mask up.

As always, Email with ideas, tips, appreciation, or criticism. Hard problems require lots of smart people paddling in the same direction. Steve@NYCMediaLab.org.

Best,

Steven Rosenbaum
Managing Director
The NYC Media Lab Must-Read Our Weird Behavior During the Pandemic Is Messing With AI Models

Instead of LEGO sets and phone cases, the top searches on Amazon during the pandemic have been for toilet paper, masks, and hand sanitizer. Customer analytics platform Nozzle captured the rise in COVID-19-related searchers on Amazon (see below), and you can almost track the progress of the virus by looking at search patterns. But the sudden jump in pandemic-related products has rippled into algorithms running in inventory management software, fraud detection systems, marketing analytics platforms, and more. Machine learning models trained on “normal” human behavior were thrown out of whack as the definition of “normal” took a sharp turn.

One company that supplies sauces and condiments to Indian retailers had to fix its automated inventory management system because bulk orders broke its predictive algorithms; another firm, which uses AI to gauge news sentiment for investment recommendations, gave skewed advice; and a streaming firm had trouble with its recommendation algorithms due to spiked content consumption. Phrasee, a company that uses NLP to create email and Facebook copy, had to ban phrases like “going viral” and anxiety-inducing language such as “OMG” and “stock up.”

5 min read

“It’s tempting to see our existing knowledge as a universe that expands outwards when we learn more, but it might be more useful to think of it as the contents of a house. Currently, we’re only able to see what’s visible through the keyhole. The point isn’t to make the house bigger; it’s to open the door.”

The White House’s not-so-humblebrag about America leading the world in testing (at least we’re in the top 10) is just a symptom of bad public-health policy and data illiteracy, says CJR. Factors that play into the confused media narrative include the interchangeable use of “case” and “confirmed case” and the fact that coverage of rising cases in the US ignored the aspect of increased testing, among other context-related snafus.

CJR suggests that “in the interim, when sharing data, we should aim to always tell news consumers about its limitations, what removing those limitations might show us, and, when necessary, whose fault they are.” (Somewhat related: take Nieman Lab’s survey about the effect of the pandemic on journalism.)

11 min read

Read More For the Media Zoom Is Giving Fandoms a New Place to Hang A fandom is defined as “a subculture composed of fans characterized by a feeling of empathy and camaraderie with others who share a common interest.” For the current events equivalent, just tack on, “on Zoom.” As the defacto meeting hub for meetings, weddings, and classes, the video platform has also been attracting book clubs, music fans, crafting groups, party goers, and other fandoms.

Fans that normally make internet friends behind avatars and text are speaking to each other live for the first time, in the process diminishing the stigma around meeting people pseudo-anonymously online. The Verge details one such event: a listening party for 5 Seconds of Summer, which attracted a few hundred attendees.

7 min read Read More Twitch Announces a New Safety Advisory Council to Guide Its Decision-Making Twitch’s new Safety Advisory Council is an eight-person group composed of experts with “deep understanding of the Twitch platform, its content and community.” The difference from, say, Facebook’s oversight board or Twitter’s trust and safety council, is that Twitch is including creators into the group, who can bring “unique challenges and viewpoints,” according to the company.

Twitch saw 48% month-over-month growth in hours watched between March and April, and is up 101% in hours watched year-over-year, as of April. Due to the viewership surge, Twitch brought the council together to draft new policies and update existing ones, and promote healthy streaming and work-life balance habits. Check out the TechCrunch piece for the full list of board members.

6 min read Read More This AI-Generated Dictionary Is Very Cool and Also Terrifying.

“Patiefarge” is a noun meaning “a group of four dogs,” and it’s completely made up. The word was generated by This Word Does Not Exist, a project by former Instagram developer Thomas Dimson, who trained a GPT-2 language model using the Oxford English Dictionary (for technical info on how Dimson set up the site on Google Cloud, see this reddit response). You can find the source code here.

1 min read

The NYC Media Lab Virtual Open House hosted the Verizon 5G Retail Ventures Challenge, which pairs startups with university faculty and students to prototype 5G solutions that can bring ideas to life for local retailers. In the video, three companies shared their Retail solutions. And Verizon’s Director of the 5G Lab Christian Guirnalda shares the future of 5G after COVID.

Facenote is a biometrics platform that uses facial recognition to help companies recognize and increase engagement with their most valuable customers.
Rilla Voice provides voice analytics for brick and mortar stores using voice recognition AI.
echoAR is a cloud platform for AR that provides tools and network infrastructure to help developers and companies quickly build and deploy AR apps and content.

55 min watch

Watch Now What We’re Listening To Podcast: Explore Explain S1E3: John Burn-Murdoch

In this episode of Explore Explain (“a video and podcast series all about data visualisation design”), FT’s John Burn-Murdoch shares the design story behind his visualizations for the publication’s Coronavirus Tracker.

1 hr 30 min listen Listen Now Virtual Events Event: Pablo Torre on the Future of Sports
Date: May 19, 12PM
The future of all sports remains uncertain. With large gatherings temporarily banned, there seem to be two key questions: what does the world of sports look like on the other side of this, and what does it look like in the meantime? Register Here.

Event: Travel’s Path Forward: Online Travel
Date: May 21, 11AM-12:30PM
Despite the hurdles of this crisis-response, online travel agencies also have the ability to instill confidence in the market to foster manageable, long-term recovery once travel returns. Join Skift editors and research analysts, as they discuss the current landscape and future preparedness with industry leaders. Register Here.

Event: Building & Deploying Models with AutoAI
Date: May 18, 12:30PM-2:30PM
Learn how to build and deploy your very own predictive models with IBM Watson Studio AutoAI, which automatically analyzes data and generates customized predictive model pipelines. Register Here. A Deeper Look Common Sense Comes Closer to Computers

It’s easy to mistake the surprisingly realistic — but ultimately shallow — results by GPT-2 for intelligence, as anyone who’s played with talktotransformer.com or AI Dungeon can attest. But it turns out to be a Clever Hans effect, as demonstrated by AI researcher Gary Marcus, who asked GPT-2 what happens when you drop a match onto a stack of kindling. The answer, as you can test for yourself on Talk to Transformer, never makes sense, and that’s because computers lack common sense — the “dark matter of AI.”

Algorithms don’t read between the lines. Hand-crafted approaches are tenuous, due to the ambiguity of language and the fickleness of hard-coded relations. Deep learning hasn’t fared well either, as seen with GPT-2, but that hasn’t stopped researchers building common sense system that take a middle approach. Commonsense transformers — or COMET — opts for generating plausible yet imperfect responses to new input, rather than using deterministic models (see the image below).

Commonsense transformers began life when professor Yejin Choi and her team began compiling a “textbook for neural networks to learn faster about the world,” called ATOMIC. When GPT-2 came out, Choi saw the opportunity to combine a language model like GPT-2 with a common-sense knowledge base like ATOMIC, resulting in COMET — a “fusion of symbolic reasoning with a neural network.” You can try COMET here.