For many companies with capable engineers, build vs. buy comes into play when purchasing software solutions. Web data extraction is no exception. Before embarking on a project to build your own web data extractor, here are 10 questions to consider:
- Does your team have experience in writing code for web data extraction?
- What are the server/networking/storage costs to continually run your data extractions?
- What happens when your IP addresses get blocked by the websites you are extracting data from?
- How will you deal with CAPTCHA?
- What happens when the websites you need data from change? How quickly can you rewrite the code?
- Are you OK with gaps in the data when your extraction fails?
- How will you deal with a URL that doesn’t change, despite changing content on the page?
- What happens when the website requires a login or a form fill before data is displayed?
- What is the opportunity cost – time away from core business for your engineers?
- What happens if the engineer(s) who build it leave?
Bottom line, if you need regular data of high quality and accuracy, it is approximately 20-30 times less expensive to license a commercial web data extraction solution than build your own.
Import.io is continuously adding features, increasing performance, and enhancing the core web data extraction capabilities. If a website requires a login, form fill, or to prove you are not a robot, Import.io can handle it. We manage storage, networking, and IP addresses. If you decide to partner with us, we take care of everything on this list and more.