Before embarking on a project to write your own web scrapers, here are 10 questions to ask yourself and your team:
- Does your team have the experience in writing code for web scraping? (It is harder than it looks)
- What are the server / networking / storage costs to continually run your web scrapers?
- What happens when your IP addresses get blocked by the websites that you are scraping? (It will happen)
- How will you deal with CAPTCHA and the slew of more sophisticated anti-scraping methods?
- What happens when the websites you need data from change? How quickly can you rewrite your code?
- Are you ok with gaps in your data when your web scraping fails?
- How will you deal with a “dynamic” website where the content changes but the URL stays the same?
- What happens when the website requires a login or the submission of a form before data is displayed?
- What is the opportunity cost on your business taking time away from core engineering responsibilities?
- What happens when the engineers who built your web scrapers leave?
If you need high quality web data at scale, it will work out 20 to 30 times cheaper to partner with Import.io than to build it yourself.
How do we know that? We ran the maths. Read our whitepaper “The Path to Web Data: Build or Buy” to see how we arrived at this conclusion.