Making the most of datasets

An import.io Webinar Production

Thanks again to everyone who came to our webinar on Datasets yesterday! I think Chris and I are starting to really get the hang of these. As usual we’ve recorded the whole thing and put it up on YouTube so you can refer back to it whenever you need to.

A Short Recap

For those of you who don’t know the Dataset page is where you can see all the data you’ve extracted! From there you can refresh your data, query your Connectors, download it to your machine or share it with your friends. It’s also the place where you can combine multiple data sources together and a good place to access our integrate page.

First I showed you how to create a new Dataset and add your Data Sources to it. Then I walked you through all the options you have such as refreshing your data, saving it, sharing it and downloading it to your machine! If you want a more in-depth refresher on these topics, check out the this tutorial!

And just in case you happen to be as big a Football fan as I am: here is the data I used.

Next, I showed you the specifics of what you can do with each of the different types of data sources you can create using our tool!

  1. Extractors – add in a new URL
  2. Crawlers – re-crawl the site
  3. Connectors – querying in the dataset page and doing multiple queries

Then things got really exciting when I showed you how to combine multiple different Connectors to create a Mix, allowing us to search one term across multiple sites! You can try my Mix to UK supermarkets for yourself, and find the cheapest place to do all your shopping.

Finally, I showed you how to do a simple integration of your Dataset with Google Sheets. You can learn more about integrations by reading the tutorials below or visiting our integrate page yourself!

Your Questions

Can you see which of the URLs have had data updated from the last crawl?

This isn’t currently possible to do with our UI – we’re working on it! You can get the previous crawls over the API though and then compare the data yourself with a simple script. If you want to know more about how to do this, just email us at support@import.io and we’ll show you!

Can I get data from behind a JavaScript action?

You sure can! By default whenever you create a data source, we first try to get the data with JavaScript turned off – because it’s easier. But, import.io Connectors do support getting data from sites that require JavaScript. Simply follow the instructions in this tutorial and if you can’t see your data in the Detect Optimal Settings step click “No”. This will turn JavaScript on and you can carry on building your building your Connector as normal!

Are there any websites that cannot be crawled?

Every website is different. Some websites are easier to crawl than others – it all depends on how the HTML is structured. Because import.io Crawlers are really Extractors, we find that we have a pretty good success rate (especially now that you can crawl with JavaScript). Try these tips and tricks first. If you find you’re having trouble getting your crawler to work just email us at support@import.io. If a crawler doesn’t work, you may also have more luck using an Extractor or a Connector, and still get the data you need.

Can you get product reviews?

You can definitely get product reviews. If you want product information and the product reviews you will need to build two different extractors (because the data is different) and then combine them in a Dataset. Chris A actually built a web app that does just that for Amazon music!

 

Are there limits to integrating with Google Sheets?

Because of the way the Google Sheets integration works you can only get one page of data from one source at a time. Which means that even if you train your Connector with pagination, when you integrate it (with Google Sheets) you’ll only be able to see the first page of results.

Join Us Next Time

For our next webinar Chris and I will be teaming up with the lovely Jewel Loree from Tableau to show you how to get data and visualize it! Sign up here to join us on the 22nd of April at 4pm GMT.

If you have an idea for a webinar you’d like to see email me at support@import.io!

Comments

Nice english accent – but seriously the video is too long, can you squeeze it down to five minutes?

Comments are closed.

Turn the web into data for free

Create your own datasets in minutes, no coding required

Powerful data extraction platform

Point and click interface

Export your data in any format

Unlimited queries and APIs

Sign me up!