Import.io User Guide

What is Import.io?

Import.io enables you to extract data directly from the web, commonly known as web scraping, but Import.io is so much more. Our point-and-click interface transforms websites into data with a few simple clicks, enabling you to get the data you need, even behind a login. If you are ready, use our getting started tutorial to begin transforming the web into data.

Who uses Import.io and for what purpose?

Import.io is used by individuals and companies. Some of the world’s most iconic brands use our product to collect critical market data, enabling them to adjust to the rapidly changing online market. Industries include retail, rental listing sites, delivery services, SaaS solution providers, universities, news organizations, consulting firms, and more. Import.io helps them react faster and with more information, thus giving them an edge.

Individuals use Import.io to research startup ideas or gather research for an article, project, or thesis. Because it is so easy to get data, individual users come from a range of backgrounds including designers, developers, journalists, marketers, analysts, and academics. For example, normally academics are reliant on using secondary data or small sample studies. With Import.io, academics use our platform to collect primary data directly from websites. A few days or weeks of using our products can provide academics with millions of data points.

Collected data can be used to consider a wide range of issues, from looking at success in online funding to capturing key features of successful online companies. Essentially, our product enables you to get the data you need from the web. How you use that data is up to you, within copyright restrictions of course.

What exactly is Import.io?

Import.io is a SaaS product that allows you to collect data from the web with no coding required. Import.io is web-based and runs through an online platform accessible through your browser, thus you don’t have to download or install anything. All you need to know is what data you want and where it is, it really is that simple.

Import.io contains a built-in crawl service specifically designed for multiple URL querying. Import.io uses dynamic rate limiting and contains a retry system to handle errors and restrictions. The result is high quality extraction performance and success. When querying multiple webpages, the crawl service queries 10 URLs asynchronously, each from a different IP address in the rotating pool, making the process more efficient. If a URL fails for whatever reason (errors can happen), the URL is requeued and tried again from a different IP address. The crawl service monitors website response time, which results in a higher quality extraction by ensuring the extraction does not put too much of a load on a website.

How does data extraction with Import.io work?

You create an extractor and give it an example of a URL that contains the data you want to extract, this could be a single-item page, or a multiple-item page. Once Import.io loads the webpage, you simply click to identify the data on the webpage you want to collect and organize into a tabular data column structure that suits you.

As you select the data you want, Import.io identifies the underlying structure of the webpage and where certain elements of data reside on the page. As you continue to select data, Import.io learns what data you are looking for. Then, when you add additional URLs with the same underlying structure, Import.io automatically knows what data to collect and where on the page to find it.

Where to go from here?

Once you have trained your extractor what to look for, you can perform more actions including adding multiple URLs to get large volumes of data, scheduling your extractor to run on a regular basis, create a change report, and downloading your data in CSV, Excel, or JSON format. You can also query your extractor using our RESTful API.

Check out these examples of kinds of data you can collect. Alternatively, you can start collecting your own data.