My original plan for this webinar was to look at voice activation and some of the hacks that we made a few months ago. Unfortunately, due to a few technical difficulties, I wasn’t able to do this. But, being the inventive guy I am I decided to wing it and show you more interesting tips and tricks you can use to pull data using our tool.
Crawling with infinite scroll
The first thing I showed you was how to get data from behind an infinite scroll. This is when some of the data on the page is hidden behind something like a “Learn more” button. This means you won’t be able to extract the other pages using traditional methods, because when you’re in extraction mode you can’t click the button. Luckily, you can use Chrome’s developer tools to access the URLs of the load more pages and use that pattern to build a Crawler. Then, using the concatenate function in Google Sheets we can generate a complete list of all the URLs for the data and paste them into the “Where to crawl” box. I’ve actually written a whole blog post and corresponding tutorial all about this topic; so definitely make sure to check those out.
Next I demonstrated how to get data which moves around from page to page. Our tool works off of the patterns in the HTML of websites in order to pull the data you need. So, if for example you want data that is on line 7 on one page and line 8 on another page, the tool will get confused. To get around this problem you can write a custom Xpath for getting the data. In this case you would use the “following-sibling” tag to get the data that follows a specific word like “battery capacity”. For more detailed information you can read our blog post or check out our tutorial.
API publish failures
For our next webinar, we’re partnering with our good friends over at Infogr.am to do some cool data visualizations. It’s going to be a great look at how to use a simple but powerful tool to make your data look awesome. You can sign up for that one here.
Turn the web into data for free
Create your own datasets in minutes, no coding required
Powerful data extraction platform
Point and click interface
Export your data in any format
Unlimited queries and APIs