Import.io User Guide

The editor

Use the editor to train your extractor to select the data that you want to collect from the website. Use the features in the editor to manage the columns itself as well as manage the data in the columns.

To access the editor, perform the following steps:

  1. In the left-side navigation pane of the dashboard, select the extractor you want to edit from the list of your extractors.
  2. Click Edit in the extractor commands menu at the upper right of the dashboard.

Note: When creating a new extractor, Import.io automatically opens the editor for you.

Viewing your training data

The editor offers two ways to view your data: data table view and webpage view. Use the tabs near the top left of the browser window just above the data table headings to change between the views.

Data table view

Click the Data tab to view your data in a table format. Data table view presents a preview of how the data will look in the file created when the extractor runs. Use data table view to preview your data, manage table column organization, and manage the data in the columns.

When there are more data columns than fit in your browser window, a scroll bar appears at the bottom of the browser window. Use the scroll bar to access hidden columns. Additionally, left and right arrows appear at the right end of the table headings. Hover the mouse pointer over the arrows to auto scroll the columns.

Webpage view

Click the Edit tab to display a copy of the webpage you are using to train your extractor. Use webpage view to see how items on the webpage correspond to data in the table and to point-and-click to select data for inclusion in table columns.

When there are more column headings than fit in your browser window,left and right arrows appear at the right end of the table headings. Hover the mouse pointer over the arrows to auto scroll the columns.

To view the data in a column, click the column name in the column headings bar. Data from the column appears in a floating window and green boxes appear around items on the webpage that correspond to data in the column.

Elements of the editor’s user interface

You access the editor commands and options in the following places in the editor:

  • Column headings bar
  • Column options menu
  • Floating data column window
  • Editor commands bar

Column headings bar

The column headings bar displays the data table column heading names in both data table view and webpage view. Clicking the names behaves differently in the views:

  • In data table view, click a name to select and highlight the column.
  • In webpage view, click a name to populate the floating data column window and display green boxes around items on the webpage that correspond to data in the column.

In both views, click the down arrow to the right of each column heading name to access the column options menu.

Column options menu

The column options menu contains options for managing columns and managing data within columns. To access the menu, click the down arrow to the right of the name in any column heading in both data table view and webpage view. Options available on the menu act on the data within that column only.

Floating data column window

The floating data column window is present in webpage view only and displays data for the currently-selected column headings bar name.

To view the data for a column, click the column name in the column headings bar. Data from the column appears in the floating window and green boxes appear around items on the webpage that correspond to data in the column.

The following options are available within the floating data column window:

Reposition the window

To reposition the floating window, click and drag the floating window heading bar.

Rename column

Renaming columns describes how to change the name of a column.

Clear

This option removes all the data in the column and keeps the empty column. Clear data describes how to remove data from a column.

Delete

This option removes the column and all the data in the column. Deleting columns describes how to remove data from a column.

Editor commands bar

The editor commands bar is present in webpage view only and contains the following functionality:

Undo/Redo

The editor tracks every action you take. Use Undo and Redo to move backwards and forwards through your taken actions.

The standard keyboard shortcuts are available:

  • To undo, use Cmd+Z on a Mac and Ctrl+Z in Windows.
  • To redo, use Cmd+Shift+Z on a Mac and Ctrl+Shift+Z in Windows.

Start over with empty table

This option removes all columns and all data from your table.

Add Column

This option adds an empty column to the right end of your table. Adding  columns describes how to add a column.

Advanced / Standard toggle

This option controls display of the advanced features on the editor command bar. To show or hide the advanced features, click the slider icon near the right end of the bar.

Advanced features of the editor commands bar

This section describes the following advanced features of the editor:

Page

Page provides access to options involving the underlying structure of webpages.

JavaScript On/Off

The JavaScript option turns use of JavaScript with the page on and off.

CSS On/Off

The CSS option turns use of cascading style sheet with the page on and off.

The option is on by default. Keep the option on to view the webpage as the author intended. Turn this option off to locate hidden elements within the webpage.

Rows

Rows controls how elements of the webpage translate into rows in the data table.

Single row

Single row collapses similar information (for example, URLs for multiple color options of a clothing product) into a single row of data in the table. Use this option when you need similar output in a single row of data.

Multiple rows

Multiple rows analyzes the data and determines whether to present similar information in a single row of data in the table or expand the information into multiple rows. This option is the default.

Row XPath

Row XPath allows you to use an XPath to manually define how to separate rows.

Train with additional URLs

The purpose of the editor is to train your extractor to select the data that you want to collect from a website. Often, using a single webpage to train your extractor is sufficient. However, when the underlying structure of webpages on a website varies, you can improve the accuracy of your extractor by adding additional training URLs.