Import.io User Guide

Extracting hidden data


When Import.io renders a copy of a webpage in the editor, you no longer can interact with the page elements to click tabs, expand/collapse sections, select items from dropdown lists, etc. Because you cannot interact directly with the webpage itself in the editor to reveal hidden elements, you cannot always point and click to select the data that you need. This topic describes how to access data hiding inside or obscured by elements.

The portions of a webpage that are visible at any given time are often determined by the cascading style sheet (CSS) for the page. By ignoring the cascading style sheet, Import.io can display all data for the webpage at one time, in an unstyled format. Without the website’s styling, the page doesn’t look neat and tidy, but all the data is accessible to the Import.io point-and-click selection process.

Consider the following example from https://www.superdry.com/mens/holiday-shop/details/69939/vintage-logo-duo-t-shirt-grey:

When browsing the actual website, shoppers select available sizes from a dropdown list.

 

But in the Import.io editor, (where the point-and-click interface selects data not elements containing data), interaction with the actual webpage is frozen and the dropdown list contents are out of reach.

 

To instruct the editor to ignore cascading style sheets, revealing the dropdown list contents, perform the following steps:

  • In the editor commands bar, click the Advanced/Standard slider switch to display the advanced options.
  • In the Page list, select CSS to remove the checkmark and turn use of cascading styles sheets off. The checkmark disappears, the formatting changes, and the dropdown list contents appear.

 

The dropdown list contents are now available for you to select to train your columns using the point-and-click selection process.

Note: After adding the hidden information, turn CSS back on to continue to train as normal.

Note: If this method is unsuccessful, try extracting data directly from the HTML source code using manual XPath.