A field guide for understanding automated journalism

By Andreas Graefe

A fellow at the Tow Center for Digital Journalism at Columbia University researched the growth of automation in journalism and produced a guide discussing key questions and potential implications of its adoption by media companies.

Editor’s note: Andreas Graefe presented the following overview of his research (republished here with permission) on automated journalism to staff of The Associated Press. For the full report, please visit the Tow Center for Digital Journalism’s website or view the PDF.

Executive Summary

In recent years, the use of algorithms to automatically generate news from structured data has shaken up the journalism industry — most especially since The Associated Press, one of the world’s largest and most well-established news organizations, has started to automate the production of its quarterly corporate earnings reports.

Once developed, not only can algorithms create thousands of news stories for a particular topic, they also do it more quickly, cheaply and potentially with fewer errors than any human journalist. Unsurprisingly, then, this development has fueled journalists’ fears that automated content production will eventually eliminate newsroom jobs, while at the same time scholars and practitioners see the technology’s potential to improve news quality.

This guide summarizes recent research on the topic and thereby provides an overview of the current state of automated journalism, discusses key questions and potential implications of its adoption, and suggests avenues for future research. Some of the key points can be summarized as follows:

Status quo

  • Market phase

    • Companies worldwide are developing software solutions for generating automated news.
    • Leading media companies such as The Associated Press, Forbes, The New York Times, Los Angeles Times and ProPublica have started to automate news content.
    • Although the technology is still in an early market phase, automated journalism has arrived in newsrooms and is likely here to stay.
  • Conditions and drivers

    • Automated journalism is most useful in generating routine news stories for repetitive topics for which clean, accurate, and structured data are available.
    • Automated journalism cannot be used to cover topics for which no structured data are available and is challenging when data quality is poor.
    • The key drivers of automated journalism are an ever-increasing availability of structured data, as well as news organizations’ aim to both cut costs and increase the quantity of news.
  • Potential

    • Algorithms are able to generate news faster, at a larger scale and potentially with fewer errors than human journalists.
    • Algorithms can use the same data to tell stories in multiple languages and from different angles, thus personalizing them to an individual reader’s preferences.
    • Algorithms have the potential to generate news on demand by creating stories in response to users’ questions about the data.
  • Limitations

    • Algorithms rely on data and assumptions, both of which are subject to biases and errors. As a result, algorithms could produce outcomes that were unexpected, unintended and contain errors.
    • Algorithms cannot ask questions, explain new phenomena, or establish causality and are thus limited in their ability to observe society and to fulfill journalistic tasks, such as orientation and public opinion formation.
    • The writing quality of automated news is inferior to human writing but likely to improve, especially as natural language generation technology advances.

Key questions and implications

  • For journalists

    • Human and automated journalism will likely become closely integrated and form a “man-machine marriage.”
    • Journalists are best advised to develop skills that algorithms cannot perform, such as in-depth analysis, interviewing and investigative reporting.
    • Automated journalism will likely replace journalists who merely cover routine topics, but will also generate new jobs within the development of news-generating algorithms.
  • For news consumers

    • People rate automated news as more credible than human-written news but do not particularly enjoy reading automated content.
    • Automated news is currently most suited for topics where providing facts in a quick and efficient way is more important than sophisticated narration, or where news did not exist previously and consumers thus have low expectations regarding the quality of the writing.
    • Little is known about news consumers’ demand for algorithmic transparency, such as whether they need (or want) to understand how algorithms work.
  • For news organizations

    • Since algorithms cannot be held accountable for errors, liability for automated content will rest with a natural person (e.g., the journalist or the publisher).
    • Algorithmic transparency and accountability will become critical when errors occur, in particular when covering controversial topics and/or personalizing news.
    • Apart from basic guidelines that news organizations should follow when automatically generating news, little is known about which information should be made transparent regarding how the algorithms work.
  • For society

    • Automated journalism will substantially increase the amount of available news, which will further increase people’s burden to find content that is most relevant to them.
    • An increase in automated—and, in particular, personalized—news is likely to reemphasize concerns about potential fragmentation of public opinion.
    • Little is known about potential implications for democracy if algorithms are to take over part of journalism’s role as a watchdog for government.

Andreas Graefe

Andreas is a research fellow at LMU Munich, Germany, and at the Tow Center for Digital Journalism at Columbia University.

insights, partnerships, text