Using import•io to get structured data from websites is generally pretty straightforward. Every site is different though and occasionally we run into a data problem that requires a little old-school ingenuity to solve. When that happens, we put our best man – that would be me – on the case! And, since I’m such a generous guy, I’m sharing these with you so if you find yourself in a similar situation you’ll know just what to do.
Building a Database
Jerome wanted to create a database of all the companies on Assintel, an Italian business network. He needed to get the company name, sector, URL, logo, address, country and telephone number.
When Crawling just won’t do it
Generally, if you run into this type of issue, the best thing to do is to see if there is another way to get to the page with the data you’re after. In this case there was a search box on the homepage, which meant I could build a Connector – which allowed me to search a company name and extract all the data on their profile page. Voila!
Of course with 423 companies, manually inputting all of these names into the Connector would have been kind of a pain. So I built an Extractor and got all the company names. Then (using some Python magic) I fed the names through the connector automatically and got all of the required data!
We’ll Find a Way
Jerome’s use case is a great example of how sometimes, you need to use two of our tools in conjunction with each other to get the data you need. At import•io we never give up in our quest to get you data. If you run into a site that just doesn’t seem to want divulge it’s data secrets, get in touch (email@example.com) and we’ll do our best to get it for you.