Just enter the URL where your data is located and Import.io takes you there. If your data is behind a login, behind an image, or you need to interact with a website, Import.io has you covered. Once you are at a web page, you simply point and click on the items of interest and Import.io will learn to extract them into your dataset. Once extractors are fully trained they can be set to run on a schedule over multiple different web pages, creating large datasets ready for transformation, analysis and integration into your applications and internal systems. Web data extraction has never been easier or more valuable.
Import.io makes it easy for you to show us how to extract data from a page. Simply select a column in your dataset, and point at the item of interest on the page.
Record sequences of the actions that you need to perform on a website. For example, you may need to navigate between pages, enter a search term or change a default sort order on a list.
Set up your web data extraction to run “on the regular” using pre-set or custom schedules: weekly, daily, hourly, whatever your business needs. Set it and forget it.
Reliable, high quality data...every time
Machine Learning auto-suggest
When you first enter a URL, Import.io attempts to auto-train your extractor by using advanced machine learning techniques. Go from URL to dataset with one click. And we’re constantly getting better.
Download images and files
Download images and documents along with all the web data in one run. Retailers pull product images from manufacturers, data scientists build training sets for computer vision.
Data behind a login
Authenticated extraction allows you to get data that is only available after logging into a website. You provide the appropriate credentials and Import.io will do the rest.
Import.io helps ensure compliance and accuracy by allowing you to capture and save screen shots of every page from where you extracted the data. This is a feature is easily accessible and useful as it creates an audit-able record of the extracted data.
Choose to obey the robots.txt file for the website and thus avoid gathering data from pages that the website owner has deemed that they don’t wish to be crawled.
Be notified as soon as data is extracted. Receive email notifications or use webhooks to make sure that you always know when the latest data is available. This will help you stay on top of your workload.
Operate at scale, web scale
Extract data from multiple pages at the press of a button. We automatically detect paginated lists, or you can explicitly click on the “next” page to help us learn.
List page, detail page
List pages contain links to detail pages that contain more data. Import.io allows you to join these into a chain, pulling all of the detail page data at the same time.
Use patterns such as page numbers and category names to automatically generate all of the URLs that you need in seconds.
Whenever you save your extractor, Import.io will automatically optimize the extractors to run in the shortest time possible.
Train the same extractor with multiple different pages. When a website displays different data variations on the same page types you want to train against all variations.
Upload custom datasets
Combine web data with other data from sources outside of Import.io. Simply upload a CSV or Excel file in and this becomes a table that can be used in any of your reports.
Country specific extraction
Control the geographical location from which your web data extraction is running. Extract pricing data in a local currency. All countries are supported.
Automatically remove personally identifiable information (PII) when extracting web data. We can detect and redact PII such as names, phone numbers and addresses. This ensures everyone’s privacy and personal data remains safe and intact.
XPath & Regex
Write your own custom extraction rules using XPath and RegEx. This can be especially useful for pulling hidden data and setting up advanced configurations.
Web scraping FAQ
What is Web Scraping?
Web scraping (or screen scraping) is a way to get data from a website. By using a web scraping tool, sometimes called a website scraper, you’re able to extract lots of data through an automated process. The tool works by sending a query to the requested pages, then combing through the HTML for specific items. Without the automation, the process of taking that data and saving it for future use would be time-consuming. Many web scraping tools offer different features that can be used to scrape web pages and convert the data into handy formats you can then use.
Why Use Web Scraping?
With so much information now online, getting that data can often prove the difference between success and stagnation. Web data can be extremely valuable not only since it is accurate but also because it is kept up to date. With the right data in hand, you can analyze what you need to determine new insights and find exciting discoveries. In essence, having data at the ready through web data scraping allows your organization to plan and act using current information. Whether it’s called web scraping or screen scraping or even data scraping, it allows businesses to be more agile and versatile in the present while planning for the future. Web scraping is also handy for pretty much any business, not just those related to the technical field. If there is data on the internet that will help your organization, you’re going to want it.
Do You Need Special Training for Web Scraping?
Of course, the use of code and scraping software to extract data can seem intimidating at first, but no extensive coding experience is needed when using Import.io. Some training will be helpful, such as the point and click training mentioned above, but Import.io provides an easy-to-use interface that allows you to perform a variety of data scraping tasks, all without the need to be deeply familiar with coding or machine learning. Since most of the technical side of data scraping is handled by Import.io, and helpful APIs integrate that high-quality information into your organization, you’ll know you’re in good hands as you extract data from the internet.