A research scientist from the MIT Media Lab explains how a data analytics machine mines news archives to open a window into the intersection of news and social media.
Editor’s note: This Q&A is the first of a two-part series examining how data from news archives can be mined to power research. It has been lightly edited from a recent conversation between Bill Powers and Dwayne Desaulniers, an AP director of regional media.
Dwayne: I hear the Media Lab is working on some projects around the election this year. Anything you can share?
Bill: One thing my group at the MIT Media Lab does is map and analyze human networks. Case in point: We’re interested in understanding how the democratic conversation is evolving in the digital age, and in developing new tools to help advance democracy.
It’s a difficult challenge, but we realized there was an opportunity to do something interesting around this year’s presidential election.
So we have a project called the Electome, funded partly by a generous grant from the Knight Foundation, that is all about offering a new window on modern democracy. And AP data helped us build it.
How does the Electome open that window?
For several generations, journalistic coverage of presidential campaigns has been pretty tightly focused on the so-called “horse race” — who’s winning and who’s losing. Obviously, there’s a natural interest in that among the public. But we feel the ideas and policy questions that are at stake in a big election like this one, and that ultimately matter most of all, can get lost in the conversation.
In response, we built the Electome, a data analytics machine that aims to shine a light on what we call “the horse race of ideas” — the actual issues facing the country.
How does it work?
The Electome pulls in thousands of news stories and about 500 million tweets per day and, using sophisticated algorithm-based tools we developed, determines which ones are about the U.S. presidential election. The AP’s archive was the model for our topic classification system, which allows us to organize this content by issues, such as foreign policy, the economy and immigration.
These classifications also enable us to run share-of-conversation analyses about the issues. What’s getting the most attention? Immigration? The economy? What’s being overlooked?
We can also see how influence is distributed among large political and news organizations, individual journalists, and other influencers, as well as the small public voices on Twitter. It’s a way of making sense of this sprawling new election conversation, and hopefully transforming it into something more useful than a lot of chatter.
We turn the resulting analysis into journalistic articles, often published with major media partners such as The Washington Post. The overall object is to give the public a window into the new public sphere that’s forming where news meets social media.
The conversation around the election this year has been pretty interesting. What are some examples of what you’ve learned?
To take one example, we invented a new kind of data analysis called a “chat scan” that looks at how people are talking about the various contenders for the presidency, and what issues they associate with each candidate.
From last August through mid-February, our Chat Scan visualizations showed that Hillary Clinton was associated, in the public’s mind, with the economy, health care and national security. With Donald Trump, it’s a different story: He was associated with race issues, immigration and national security/foreign policy.
Social media appear to be a new ecosystem of influence, related to but different from the traditional news ecosystem.
This is what the Twitter public was saying, not the candidates themselves – these Chat Scans offer a new way of listening to the public, distinct from traditional polls.
In another recent project using the Electome, we developed metrics to determine which people and organizations have had the most influence on the election conversation so far.
To do this, we looked at the news coverage and the tweets that were resonating most in the public sphere, and wound up with a list of the top 150 influencers. It’s a mix of organizations and individuals — with the candidates themselves naturally at the top.
Have there been any observations you’ve found that have stood out to you?
Looking at the influencer list, we found some interesting surprises. The first is that despite conventional wisdom that Twitter skews liberal — because it’s dominated by younger people and so forth — we noticed a strong presence on Twitter of conservative voices, at least within the election conversation.
The second surprise was the number of non-famous individuals who made the list. Naturally, there were celebrities who transcend politics — the Pope made it, Cher made it. But there were quite a few influencers we had never heard of at all.
These are people who don’t get quoted in the news coverage, yet their stuff is echoing on Twitter. That was a revelation to us. Social media appear to be a new ecosystem of influence, related to but different from the traditional news ecosystem.
You mentioned how the Electome uses data from news archives as a source of information. What are some attributes about it that have helped you?
What we are finding is that AP data, and the AP archive in particular, can serve as a ground truth for us and assist with both our topic classification and our quality assessment. We’re able to return over and over to our valuable AP data and see how everything else stands up within the context and framework it provides.
Dwayne is the Boston-based director of regional media for The Associated Press, responsible for member and customer relationships, strategic analysis, revenue, and planning and business development in New England.