Import.io User Guide

My extractor doesn’t work


Sometimes things don’t work as expected. This topic explores the following common issues and how to resolve them:

  • My extraction was successful, but 0 rows were extracted.
  • My extractor is not pulling data from a specific column(s).

1. My extraction was successful, but 0 rows were extracted.

Main cause

The structural integrity of the HTML for the website has changed since the extractor was trained.

How do I diagnose the problem?

Check the log file.

The following screenshot depicts what your log file likely looks like:

The reason the URL is successful is because the page was rendered correctly by the extractor. The reason no rows are extracted is because the HTML structure of the website does not match the training of the extractor, resulting in no matching data found in the locations the extractor is looking.

How do I fix it?

To resolve the issue, perform the following steps:

  • In the left-side navigation pane of the dashboard, select the extractor you want to edit from the list of your extractors.
  • Click Edit in the extractor commands menu at the upper right of the dashboard.
  • Click the Edit tab.
  • In the editor commands bar, click the Advanced/Standard slider switch until the advanced options appear.
  • Click Train with additional URLs.
  • In the Add training URLs box, enter the original URL.
  • Click Go. If the website no longer matches the training, the following message appears:
  • Click Save and Close.
  • Notice the floating data column window shows no data.
  • In the floating window, click Clear data to remove the current training.
  • Use the point-and-click interface to retrain the column.
  • For each column, repeat the clear and retrain steps.
  • Click Save in the upper right of the editor. The Save Extractor dialog box appears.
  • Click Save and Run. The editor closes, returning you to the dashboard. The dashboard switches to the Run History tab and displays the current progress of the run.
  • When the extractor run completes, click the Download icon for the run and download the data in CSV format.
  • Open the CSV file. We have data!

Why did the data look right in the extractor editor?

The original webpage the extractor copied is a cached version of the page – a snapshot. Since the time of that copying, the website structure changed in such a way that the extractor training no longer matches. So, the solution is to retrain the extractor using the current state of the website.

2. My extractor is not pulling data correctly from one or more columns.

Main cause

Similar to the previous issue, a change to the HTML structure of the website for that particular piece of data no longer matches the original training.

How do I diagnose the problem?

The CSV or JSON data looks normal, except for data missing from one or more columns.

How do I fix it?

To resolve the issue, perform the following steps:

  • In the left-side navigation pane of the dashboard, select the extractor you want to edit from the list of your extractors.
  • Click Edit in the extractor commands menu at the upper right of the dashboard.
  • Click the Data tab.
  • Notice the training looks fine in the editor. Data is populating the column(s) showing missing data in the output of the extractor run.
  • Click the Edit tab.
  • In the editor commands bar, click the Advanced/Standard slider switch until the advanced options appear.
  • Click Train with additional URLs.
  • In the Add training URLs box, enter the original URL.
  • Click Go. Because most of the data is correct in the CSV file, the page likely loads successfully in the extractor and the following message appears:
  • Click Save and Close.
  • In the editor commands bar, click Rows dropdown list and select Single row to ensure the data for each item returns in one row. Having each item in a single row enable you to compare both versions of the webpage, the trained one and the current one, more easily.

    Note: If you have multiple rows per URL, use Multiple rows because Single row compresses all of your rows into one.
  • In the column headings bar, click the column heading of a column you are fixing.
  • Notice the floating data column window shows no data.
  • In the floating window, click Clear data to remove the current training.
  • Use the point-and-click interface to retrain the column.
  • For each column of missing data, repeat the clear and retrain steps.
  • Click Save in the upper right of the editor. The Save Extractor dialog box appears.
  • Click Save and Run. The editor closes, returning you to the dashboard. The dashboard switches to the Run History tab and displays the current progress of the run.
  • When the extractor run completes, click the Download icon for the run and download the data in CSV format.
  • Open the CSV file. Success! We’ve resolved the issue!

Why did the data look right in the extractor editor?

The original webpage the extractor copied is a cached version of the page – a snapshot. Since the time of that copying, the website structure changed in such a way that the extractor training no longer matches. So, the solution is to retrain the extractor using the current state of the website.