The Traffic Manager determines the most successful network route for page requests and responds dynamically to network failures, website latency or attempts to block website access.
Always available web data, as if you were browsing on your own computer
The Traffic Manager constantly monitors for error codes, redirects and the presence of a variety of proprietary blocking methods that we have observed over the years
If an IP address that we are using is blocked by a website, the Traffic Manager will detect the block, swap out the IP address and retry automatically.
The Traffic Manager has access to multiple different types of IP address from many different sources, meaning that we are always able to find a route through.
If a website requires data extraction from a particular geographical region, then IP addresses from that region will be selected by the Traffic Manager.
The Traffic Manager automatically detects CAPTCHAs if and when they arise and sends them on for human solution by third party services that we have integrated.
Import.io’s automated web browsers look and behave like human browsers. They have full cookie jars, browsing history, valid user agents, they move the mouse around the screen etc.
Making our automated browsers behave like humans is not motivated by a desire to mask our identy, Import.io has these features in place so that target websites will behave the same way regardless of whether they are responding to an automatic or a human browsing request.
Do no harm!
The Traffic Manager has multiple methods in place to ensure that our automatic web browsing does not adversely affect the performance of the target website. We use the following three principles to guide the operation of our web data extraction: go slow, monitor websites, limit concurrency.
In conjunction with the Robot Scheduler in the Data Operations Center, the Traffic Manager will force the extraction to go as slow as possible in order to extract all of the data that is required while also meeting the timeliness requirements of the project.
The Traffic Manager automatically monitors the performance and responsiveness of all websites that we extract data from, if the Traffic Manager notices an increase in error messages or an increase in latency from a particular website, then it automatically reduces the speed at which we are extracting data in order to find a speed more suitable for the website.
The Traffic Manager monitors and limits the number of concurrent extractors that can be operating on any individual website across all of our customers at any single point in time.