How news can help data scientists predict the future

By Dwayne Desaulniers

As access to data science and analytics becomes more affordable, companies large and small are jumping in. Our New England director of regional media explains how news can enhance data outcomes.

The Harvard Business Review calls it the “sexiest” job of the 21st century.

“It” is a data scientist, and in Cambridge, Massachusetts, dozens of startups are competing to bring the power of analytics to the corporate masses. It’s a growing market with significant potential, given the investment, activity and brainpower flooding into the sector.

Data science – commonly referred to as “big data” – has been around for years, but only accessible to a few companies because of the costs of technology and the skills required to turn data into corporate strategy and insights. Now in Cambridge and other tech hubs in the U.S., accessibility to the craft is being democratized.

How it works is this: Startups focus on building software available for third parties to use for a fee. These programs offer all businesses and their researchers the computing power and number-crunching algorithms to begin mining their data for new insights, trends and other findings that humans, generally, don’t have the capacity to detect.

A financial trader evaluates the Dow Jones Industrial Average chart. (AP photo/file)

Most firms or academics eager to tap into these platforms bring their own data for analysis. Often, the data include internal sales figures or transactions, stock prices or publicly available data from governments – everything from census to crime stats. So far, the results are promising; the machines are seeing things humans do not.

For example:

- On Wall Street, hedge funds use data science to “back-test” archival data to determine why stocks or other traded contracts rose or fell in the past, hoping these insights will help them anticipate the next peak or valley with enough early warning to predict and make money from those events.

- In cities across the United States, police departments are increasingly turning to data scientists to generate heat maps for patrols to guide them to streets revealed to be problematic at various times of the day and for different reasons.

But even data scientists get bored with internal analysis and say the “magic” or surprises come when multiple data sets are combined – say, mixing commodity data with barometric pressure data. Doing so can reveal larger, more detailed findings.

Because our news is accurate, we can offer clean data that leads to good decisions.

When they discover these insights, researchers look at environmental factors such as news that can influence human behavior. The more data the software has, the better it can understand the past and predict the future.

For this reason, data scientists and companies from various industries look at The Associated Press’ archives to mine for signals or triggers that influenced or caused certain actions. They also ingest AP’s live news feeds to monitor for real-time repeats.

We’ve conducted our own analysis by looking at feedback from customers and are consistently evaluating the quality of our content. Because our news is accurate, we can offer clean data that leads to good decisions.

We also include timestamps and other metadata in our stories to help programmers scour the news for relevant information. And because we store all versions of our articles, data software can recognize an event as it is unfolding, which is significantly more instructive than using only the final version of a story after the fact.

As we constantly look for ways to deliver value to our customers, our goal is to work with these startups to highlight the role news can play in data science. If their software is going to learn to predict the future, there’s no better way to perfect those algorithms than with AP’s record of the past.

Dwayne Desaulniers

Dwayne is the Boston-based director of regional media for The Associated Press, responsible for member and customer relationships, strategic analysis, revenue, and planning and business development in New England.

insights, data