For this webinar I took a bit of a back seat and passed the spotlight over to our Data Intelligence guru, Bea Schofield. Some of you may recognize her as our resident data journalism expert, and I thought this would be a great opportunity to have her show you how it’s done.
How to do Data Journalism
There are generally two approaches to Data Journalism. The first, is to pull live or recent data, usually to build infographics and visualizations to help people understand your story. The second, is to monitor data over time to identify interesting patterns and trends.
For the majority of data journalism data needs (say that 3 times fast!), you should be able to use an Extractor or a Crawler. When you start off doing data journalism, it’s helpful to have a problem in mind that you want to solve or a question that you want to answer. From there, you can figure out where the data is that you will need and, depending on what that data looks like and how much of it you need, you can make an informed decision on which of our data grabbing tools will work best.
The Grand Prix
First, Bea showed you how to grab data a table of data on Formula 1 by building an Extractor and using one of our new tools, Table Auto Extract. This dataset may look simple, but since import.io can get it for you live it’s a great one to create an interactive infographic for a story.
Formula 1 Extractor
Inc 5,000 list
Another popular way to use data in journalism, is to collect historical data and compare it over time to identify interesting trends. To illustrate this, she showed you how to build an API to company job’s pages and then use that schema to monitor lots of companies on the Inc 5000 list. As a journalist, you could use the number of jobs posted overtime as an indicator of company’s growth and spot some interesting trends in the way tech companies grow.
Sparc Jobs Extractor
Tip: Click and drag to highlight more specific pieces of data for either rows or columns
Finally, Bea demonstrated how to build a Crawler for election data from the recent Indian elections. As a journalist, you could use this information to analyse which party was most popular in each state and or create a cool visualisation. Crawlers are a great way to get lots of data at once, and a powerful tool for data journalists.
Indian Elections Dataset
She also showed you how to correct your data extraction if you make a mistake part way through. Say you chose single results, but two pages into training the crawler you realise that you actually need multiple results. You can simply go back a few steps (by hitting the back button), choose another option and retrain your examples. If you’ve already mapped a few pages, you will need to go back and check them to make sure the tool has been able to remap them to your new specifications.
Moral of the Story
By collecting and analysing data from across the web, journalists can identify newsworthy trends, break stories and engage in more accurate reporting practices. Data driven journalism improves the overall quality of reporting by allowing for a more open and transparent look at the facts of a story. Here’s a real life use case from our friends at GigaOm!
If you’re interested in learning more about Data Journalism, Bea will be doing a video series with Journalism.co.uk showing you how she helped Oxfam make headlines by exposing the wealth inequality gap in the UK in a few weeks.
What tools would you recommend if you need to do additional data cleansing?
import.io can integrate easily with both Excel and Google Sheets, both of which are excellent tools if you need to manipulate or clean up your data a bit more. Depending on the type of data you are using they can also be great for analysis and a bit of visualization.
What is your favourite source of data?
When you’re doing data journalism, one of the best places to look for data is government websites.
What are the legalities around scraping all this data?
At import.io we are working to build a “Structured Web” where data can be easily accessed via a vast network of user-created APIs. As such, import.io is part of an eco-system containing both data users and data owners and we act as the pipeline that allows data to flow from one to the other. It is the responsibility of the data user to respect the terms and conditions of the websites they are extracting data from. For journalists specifically, you should always quote the source and link back to it in your story if possible.
Chris and I will be back for next weeks webinar where we’ll be showing you how to create an awesome app and answering all your questions about integration. Sign up now!