10 questions to ask before deciding to build a web data extraction solution


For many companies with capable engineers, build vs. buy comes into play when purchasing software solutions. Web data extraction is no exception. Before embarking on a project to build your own web data extractor, here are 10 questions to consider:

  1. Does your team have experience in writing code for web data extraction?
  2. What are the server/networking/storage costs to continually run your data extractions?
  3. What happens when your IP addresses get blocked by the websites you are extracting data from?
  4. How will you deal with CAPTCHA?
  5. What happens when the websites you need data from change? How quickly can you rewrite the code?
  6. Are you OK with gaps in the data when your extraction fails?
  7. How will you deal with a URL that doesn’t change, despite changing content on the page?
  8. What happens when the website requires a login or a form fill before data is displayed?
  9. What is the opportunity cost – time away from core business for your engineers?
  10. What happens if the engineer(s) who build it leave?

Bottom line, if you need regular data of high quality and accuracy, it is approximately 20-30 times less expensive to license a commercial web data extraction solution than build your own.

