Using artificial intelligence to produce news insights

By Francesco Marconi

We collaborated with Cortico, a media analytics nonprofit recently launched from the Laboratory for Social Machines at the MIT Media Lab, to analyze the level of attention the president has given to certain issues on Twitter, as well as how other users have responded.

As we look at additional ways of utilizing artificial intelligence in our journalism, we identified U.S. President Donald Trump’s tweets as a potential source of insight into how his public discourse has evolved over time.

We collaborated with Cortico, a media analytics nonprofit recently launched from the Laboratory for Social Machines at the MIT Media Lab, to parse a data set of Trump’s posts from his first 100 days in office. We analyzed the level of attention the president gave to certain issues, as well as how Twitter users have responded.

“When the idea of a collaboration was brought up, the newsroom was excited about the opportunity to do something new and dig up new information about a subject that is, frankly, very well-covered territory,” said Kathleen Hennessey, AP’s White House editor.

“We let the data drive the story, rather than coming in with a predetermined set of facts we wanted to uncover.”

Our results showcased the benefits of using machine learning in reporting:

- Exuberance drives engagement: Tweets featuring words in all capital letters drew, on average, an additional 5,000 retweets and 20,000 favorites than those that did not. Similarly, tweets with exclamation points generated an extra 25,000 favorites, on average.

- Timing and delivery: Trump’s most-engaged tweets came in the early morning hours, but his overall output was fairly evenly divided among mornings, afternoons and evenings.

- Content matters: Tweets mentioning Russia or “fake news” averaged about 70 percent more engagements than those that did not.

- Twitter fatigue and fading novelty: During Trump’s first 50 days in office, 60 percent of his tweets received more than 50,000 engagements. In his second 50 days, only 9 percent reached that level.

- Leveraging influencers: About 70 percent of users who quoted Trump’s tweets have verified accounts, including journalists, celebrities and politicians.

- Responses shaped by partisanship: Reactions to the president’s tweets broke sharply along partisan lines, with conservatives more likely to retweet than liberals.

“This collaboration shows how machine learning and AI research applied to the practice of storytelling can make a real impact in newsrooms, empowering journalists to bring stories to life creatively and quantitatively,” said Deb Roy, director of the Laboratory for Social Machines and associate professor at MIT.



The process of how machine learning informs journalism

Researchers at the Laboratory of Social Machines built a series of classifiers to categorize Twitter users by age, gender, political ideology and location. They did so by using a type of artificial intelligence called supervised learning — a process of extracting novel information from labeled input data.

In this case, researchers labeled tweets as belonging to a certain demographic (for example, left- or right-leaning citizens) and allowed the AI to differentiate among the labels and develop its own rules to later classify new exemplars.

John West, a data journalist at Cortico, next built an explorer tool, a navigable digital platform that allowed AP journalists to examine unique subsets of data in conjunction with others — a virtual pivot table. This explorer allowed the team to eventually parse data by dimension and metric.

- Dimensions refer to the characteristics of Trump’s tweets, such as which account he is tweeting from, what time of day, the tone of his tweet, etc.

- Metrics refer to the public reaction to his posts, including the age, gender, political ideology and location of the responder.

“At Cortico, we’re designing accessible artificial intelligence tools with and for journalists to enable deeply reflective storytelling,” said Eugene Yi, deployment strategist and co-founder at Cortico.

Insights were then derived from combining different dimensions and metrics and evaluating their meaning, allowing the team to create the below visualization based on the aforementioned data. The graph reveals a newsworthy trend: Trump’s Twitter engagement is fading.



Informed by those insights, AP visual journalist Maureen Linke was able to build upon the initial findings and contextualize the data through traditional journalistic research.

“Cortico played an integral role by sharing its initial findings and analysis, which allowed AP to further investigate interesting data points that were compelling for use in both the story and graphics,” Linke said.

The findings were featured in a story by AP White House reporter Jonathan Lemire in collaboration with Linke, which was picked up by more than 270 media outlets including The New York Times, Politico and The Washington Post, and featured as a must-read by industry think tanks such as Nieman Lab and NYC Media Lab.

A brief video segment was also produced by AP’s head of U.S. video, Denise Vance, featuring a 3-D data visualization by MIT researcher Ann Yuan.

Key takeaways

Artificial intelligence and machine learning allow journalists to analyze data, identify patterns and trends from multiple sources and uncover hidden insights — this project being one example.

While these tools will help augment journalism, they will never replace it. AI might aid in the reporting process, but journalists will always need to put the pieces together and construct a digestible, creative narrative.


Francesco Marconi

Francesco is the manager of strategy and corporate development at The Associated Press. He is also an affiliate researcher at the MIT Media Lab and an Innovation Fellow at the Tow Center for Digital Journalism at Columbia University.

Tags:
featured, insights, data, multimedia, video