Import.io User Guide

What data can be extracted?

With Import.io, you can extract data from the entire web! From e-commerce to science journals, Import.io enables you to get data from the web into a structured format. The following examples show some different types of data you can extract. And when you are ready, you can create your own extractors.

Example 1. Details for a single product on a webpage

Some webpages contain many details about a single product. Amazon product pages are a good example.

Use a details extractor to capture details such as price, savings, in what department the product is sold, and other information directly from the page. Then use the same extractor to analyze other webpages and capture the same types of data from those pages too. The following table shows data for two products we collected from Amazon:

Example 1 is an example of extracting data from a single-item pages, pages with details about a single item, in this case cameras. You can use an extractor to compare details from different single-item pages, as shown in the example table. This one simple extractor works on most Amazon product pages.

Example 2. Details for multiple products a webpage

Some webpages contain information about multiple items on a single page. Trip Advisor pages are a good example.

From this kind of page, use a details extractor to collect information about multiple items from the same webpage, as shown by the multiple pink boxes the editor uses during the point-and-click selection process. The following table shows data for all the items we collected from a single Trip Advisor page.

Example 2 is an example of extracting data from a multi-item page, a page with details for multiple items, in this case hotels. You can use an extractor to collect a large amount of information from a single page.

Example 3. Pages of links to other pages

Some webpages contain a list of links to other pages that contain the details. The BBC News website is a good example. The home page lists the tops stories, with links to the stories themselves.

Use a links extractor to capture the list of links to the stories. Then pass the list to a details extractor to collect data from the individual stories by chaining the extractors together.

The links extractor looks at the front page of the BBC news site and obtains the URLs of the top stories.

The details extractor collects the following key pieces of data from one of the top stories pages:

By chaining the extractors together, the details extractor collects details from each top story in the list created by the links extractor. The following table (opened in Excel) shows the resulting collected information.

Example 3 is an example of extracting URLs from a links page, then extracting data from the single-item pages at those URLs, in this case news articles. Using extractors for media articles provides a way to quickly obtain mass media information.

Note: Search results pages and product list pages are great places to obtain a list of links to single-item pages. These pages typically contain minimal details and rely on the associated single-item pages to present the details. Learn more about the differences between single-item, multi-item, and links pages.

Example 4. Ever-changing pages

Some webpages contain information that changes frequently. The BBC News website from example 3 is a good example. The tops stories on the home page change daily.

Schedule your extractor to run every day to collect information about the ever-changing news articles on the BBC front page.

Example 4 is an example of scheduling an extractor to run on a regular schedule, in this case daily. Using extractors for media articles provides a way to quickly obtain mass media information. Combine extractors for the BBC News a few other websites to quickly gather what is happening in the world without having to open each article one by one.

Example 5. Tracking data trends

Some webpages contain information that changes frequently in trackable patterns. XE.com is a good example. The website states current currency exchange rates.

Create an extractor to capture current currency data into the following format:

Then schedule the extractor to run as often as you need to keep up to date with the current currency values.

Example 5 is an example of extracting data over time to analyze trends. For example, an economist can continually extract this data to examine changes in the currency exchange market. Alternatively, you might track this data to see if there is any benefit in investing in a foreign currency.

Now that you have seen a few examples, try creating your own extractor.