How to become a data journalist: day 1
It does not happen everyday to be able to attend a training session taught by two Pulitzer Price winning journalists. The occasion is even rarer if the training in question is about data journalism and the location is the small medieval town of Perugia, in Italy, a country still facing the prehistory of this journalistic method.
The audience attending the first workshop of the Data Journalism School organised by the European Journalism Center and the Open Knowledge Foundation probably did not know what to expect from a session named Precision Journalism and Pulitzer Prizes: maybe the best strategy on how to win a Pulitzer through data journalism, maybe a dive into the personal experience of speakers Steve Doig and Sarah Cohen.
All doubts were quickly swept away when Prof. Doig opened a Microsoft Excel file and showed it to the waiting audience: ” The data stories we typically produce in the U.S. mostly involve something as simple as counting, summingvand sorting” he said as he searched his computer to retrieve a spreadsheet showing crime statistics in Italy according per region and province.
“You have to start thinking in terms of columns and rows: anytime you can put information in this format you can do data journalism” continued Doig.
Although one hour and a half is a too short time to fully grasp the potential Excel embodies to unearth powerful stories in the public interest, the session provided a good introduction and overview of the many tricks and shortcuts the program offers journalists. “Excel is a great tool that will do very boring and tedious tasks for you” observed the professor of the Walter Cronkite Journalism School as the audience fought against an hostile wi-fi connection to keep up with the exercise.
When asked how to cope with the reliability of data Sarah Cohen suggested to “compare the same data compiled by different people or organisations, to check if the information matches or is reliable. Of course you can also move out of the statistic and go out in the real world to check for yourself how an incident positions itself on the ground” she continued. “You may go out with the police one day and see how their reports are filed and where they go and so on. Police statistics often don’t match up with official statistics, but if you know the reason behind it you actually don’t worry too much about it.”
The second workshop for the first day of the Data Journalism School was held by Dan Nguyen of ProPublica and Friedrich Lindenberg of the Open Knowledge Foundation, who introduced the audience to the art of cracking PDF files and scraping websites.
“It would take you an eternity to look through a PDF file to find what you are looking for. What we really want is to get the information into a spreadsheet” explained Nguyen while the screen showed an apparently unsearchable and long PDF file. “My best piece of advice is: learn regular expressions and you will be happy ever after” he added, also inviting the audience to download the slides of his presentation from the link he shared on the screen.
The immense potential of web scraping was introduced by Lindenberg who explained how “every site on the web is a potential database. When we look at website we do not think of it like data but it is. It is accessible to read but not for other types of analysis.” He offered a short overview of HTML tags and their meaning and how to navigate websites in the best way to extract information from them.
Judging from the enthusiastic comments from the audience when shown what tools like Scraperwiki can achieve, the second day of the Data Journalism School is going to replicate the success of the first session.
See you tomorrow.
Claudia Costa