Getting started tutorial
This tutorial uses the basic workflow and guides you through creating and running your first extractor. By the end of this tutorial, you will be able to transform your chosen webpage into data. The tutorial is available in the following video as well as the text that follows.
Choose the webpage from which you wish to extract data. This tutorial uses https://www.yelp.co.uk/search?cflt=restaurants&find_loc=London%2C+XGL%2C+GB, a list of restaurants in London, England with some interesting data, including star ratings, location, price range, and address. For the purposes of this tutorial, we are going to extract data only from the first page of search results. We’ll show you how to add the rest later.
To initiate creation of your new extractor, perform the following steps:
- Access dash.import.io from your web browser. The Import.io dashboard appears.
- Note: If a login page appears instead of the dashboard, follow the prompts to log in to your account.
- Note: If you do not yet have an account, click Sign up here and create your account.
- In the upper left of the dashboard, click New Extractor. The Create a new Extractor dialog box appears with Go disabled.
- Enter the URL of the page containing the data you’d like to extract. When you enter a syntactically-correct URL, Go changes from disabled to enabled.
- Click Go. The following message appears:
Import.io loads the webpage located at the given URL and analyzes the page for data.
This process can take a few seconds to complete.
If analysis discovers information that might be interesting data, Import.io creates and populates a data table, launches the editor, and displays the table in the editor’s data table view.
If you are happy with the data Import.io found, that’s great – you can skip ahead to save your extractor. Otherwise, you can edit the data. For this tutorial, let’s edit some of the data.
Initially, editing extractors is more intuitively obvious using the webpage view. Click the Edit tab to switch to webpage view. Import.io renders a copy of the yelp.com webpage inside the editor and lists all the data it finds across the screen in the column headings bar.
Here, you can add, edit, or delete any of the columns. In this tutorial, we demonstrate collecting four data points: restaurant name, image, price range, and cuisine type. Because the Import.io analysis identified many columns we don’t need, we can either delete each unwanted column or we can start from scratch.
To populate the data table for this tutorial, perform the following steps:
- In the editor commands bar, click Start over with empty table. All the table columns disappear from the column headings bar along with the data in the floating data column window, and the column heading name box in the floating window automatically highlights, prompting you to name your first column.
- Type Name and either press the Enter key or click elsewhere on the page.
- Hover your mouse pointer over the name of the first restaurant in the list until the thin pink box surrounds just the restaurant name (make sure not to include the Ad icon – we want just the restaurant name) and click. Notice the thin pink box changes to a thicker green box indicating successful selection. Repeat for the second restaurant name, and so forth. When Import.io recognizes the pattern, it dynamically populates the table with all the restaurants listed on the page.
- If Import.io identifies a link behind an item you select, a Question? dialog box appears, giving you the option to extract the link along with the name. Note: While the data appears in one column in the editor, when your extractor runs, the extractor stores the data in separate columns in your CSV file or as a separate datapoint in the JSON response.
- In the editor commands bar, click Add column and following the same steps as you did for the first column to create a data column of the images displayed to the left of the restaurant names.
- If you make any mistakes or something doesn’t look right, click the reverse circle icon in the editor commands bar to undo your actions.
- Continue the process to add the price range and cuisine type columns using the same method.
- Click the Data tab to review your table in the data view.
- Click +2 items in the cuisine type column to reveal that Import.io extracted multiple cuisine types for these restaurants. Note: When multiple items are extracted in one cell, the items appear separated by semicolons in the CSV file.
If you are happy with the look of the table, perform the following steps:
- Click Save in the upper right of the editor. The Save Extractor dialog box appears.
Here you name your extractor, set the run schedule for your extraction, and choose whether or not to receive email notification of run completions.
- In the Extractor name box, enter a name for your extractor.
- From the Schedule to run list, select Once. (There’s no need to run this tutorial extractor on a regular basis.)
- Click Save and run. The first time you create a new extractor, you need to save and run the extractor to review a test run of your dataset. The extractor runs and Import.io returns to the dashboard Run history tab to show you the results of your test run.
- Notice the details of your run, including the icons that allow you to preview the data, download the log file, and download your dataset as an Excel, CSV or JSON file.
Congratulations, you have created your first extractor!