How to use our advanced features

I took a back seat on the webinar this week and left you in the very capable hands of our developer duo Chris A and Chris B (Bamford) who showed you some of our more advanced features. Now you may think that you need to a developer to use these features, but I’m here to tell you (as a non-dev myself) that actually the concepts are pretty simple. With just a little bit of extra knowledge you can make our tool do some pretty crazy things. And if you get stuck you can always shoot me an email at and I’ll help you figure it out!

Ok, enough of that… Let’s get this show on the road!

Advanced Column Settings

The first advanced feature we offer to developers is the ability to get more targeted data during your extraction. Xpaths are what our app uses to pick out data from a webpage – when you highlight things with your cursor, is actually looking at the Xpath underneath that. But sometimes you want data that you can’t necessarily be seen on the page (meta tags for example) or it might move around from page to page on the site. This is where our Xpath override feature comes in. You can use Chrome’s developer tools to help you find the bit of the Xpath you need and then modify it a bit to get exactly the data you want from the page!

Regular Expressions (Regex), on the other hand, are a way of filtering the results from the extraction into something more specific. So, if the browser is pulling too much data you can use a Regex to refine it to get just what you want.

Here is the configuration for the Amazon crawler Chris created: Amazon Crawler

You’re a star

One of the other things you can use our advanced column settings for is to get star ratings. When you see a star rating on a page, you can usually get them as an image, but that’s not always very helpful. A lot of times what you really want is the star value (ie. 3 of 5). Using a combination of Xpath and Regex you can pull exactly that!

If you’re looking for some flowers, here is the API Chris built: Flowers

Note: when using Xpath on a multiple results row, your Xpath is relative to the specific row not the entire page (like it is on a single results page). Chris shows you how to work this out in the video of the webinar.

Client Libraries

The development team has been very busy upgrading our client libraries for different languages to help you integrate your data more efficiently. Simply chose the programming language (from the Integrate page) you want and we tell you where we host our script (if applicable), how to configure it with your credentials, and finally how to execute a query.

The great thing about our client library examples is that they are dynamic, meaning you can actually use the code we generate because it is specific to you. At the bottom of each of the language tabs on the Integrate page, you’ll see that we actually generate a full example of an integration using your data. If you hit the integrate page from your Dataset it will fill in that example with the data you extracted. As an extra bonus, if you’re using the JavaScript example, you can even run the example in your browser!

Download Crawler and Dataset data over the API

You can download the data saved for Crawlers and Datasets programmatically using the API. Chris has written a more in-depth blog post on how to do it, but in essence you paste this link ( with the GUID to your Data Source into your URL bar. Then, from the information that is pulled back you can use the “snapshot” field GUID to get the JSON file. You can also use this method to access your historical data from that source.

Question Time

Which character encodings do data sources support? detects a range of character encodings, including UTF-8, UTF-16, GBK, and many others using a combination of HTTP headers and HTML meta tags. If you notice encoding issues then please drop us a line on with which site you’re having issues with, and we’ll take a look.

Is there a quick way to tell the difference between multiple versions of a Crawler or Dataset’s saved data?

Currently does not provide this feature – but feel free to add it on our Ideas Forum (along with any other ideas you want!)

How do Crawlers deal with robots.txt?’s Crawlers obey all robots.txt directives – there is no way to override this.

Can you use Xpaths and regular expressions at the same time?

Yes! Watch our webinar video embedded above to see how we combine these two tools together to extract star ratings from a particularly tricky data source.

Where can we find documentation for the API?

All of our API documentation is currently on our docs site and information on integrating using client libraries is available on our Integrate page.

Also a quick congratulations to Raplh for winning the best question data punk t-shirt!

Join us next time…

Your favorite double act is back! Chris A and I will be doing another Tips & Tricks webinar to show you a few more tricks of the trade. Make sure you come prepared with questions, we’ll be giving away another t-shirt for the best one!

Sign up now!

Turn the web into data for free

Create your own datasets in minutes, no coding required

Powerful data extraction platform

Point and click interface

Export your data in any format

Unlimited queries and APIs

Sign me up!