Bulk extract data using import.io and Google Sheets

Attention: There is now a NEW version of this post. Please click here.

Today’s webinar was brought to you by a very special guest, none other than Co-Founder and Product Evangelist, Andrew Fogg! It’s rare that I willingly relinquish the webinar spotlight, but when I heard what he had come up with, I just knew I had to let him tell you all about it.

To give you just a little bit of background, one of the things users request is the ability to bulk upload a bunch of URLs or queries to an Extractor or a Connector. At the moment, unless you know how to use them over the API, you have to add them one at a time from the data set page. And, if you have say 50 or 100 URLs, that can be a bit of a slow process. I’ve of course put the dev team hot on the trail of adding this feature, but in the meantime Andrew has come up with a handy work-around using Google Sheets.

For this webinar, in the interest of time, we will need to assume that you already know how to build import.io Extractors and Connectors. If you don’t, no need to worry, you can check out my latest Getting Started webinar which will teach you everything you need to know!

The Batch Search Sheet

Andrew has built this very handy Google sheet, which allows you to put the ID of any import.io Connector or Extractor and then put in as many queries or URLs as you want at once. In order to use his sheet for yourself, you will first need to make a copy of it to your own Google Drive.

The first two things you will need is your user ID and API key (you will need your password for this) which you can get from your My Account page. Simply copy and paste these into their corresponding boxes in spreadsheet (you’ll only have to do this once). These help us identify you and allow you access to your import.io sources.

The next thing you’ll need is the GUID for the Extractor or Connector that you want to query. You can get this from the My Data page or from the data set page itself.

Next, click on the “Get input names button” – this will go to import.io and look up the GUID you have provided and bring in the input you trained it on.

The final step is to map your inputs in sheet 2. This is where you put in all the queries you want to do (one per cell). Then you will need to go back to sheet one and specify which column you have put your query list in.

Now you’re ready to go. All you have to do is go to sheet 3 and hit the “Get results” button. And just like that, all that data will be pulled back into the sheet live from the site.

One Step Further

Let’s take this up a notch. One of the things people like to do is to use the output of one API as the input for another. In this case, Andrew used a Connector to get product URLs from the search results in REI and then an Extractor to get data from the individual product pages.

Using his batch search spreadsheet he used his Connector to generate the product URLs and then upload all of them into his Extractor. The benefit to using this method as opposed to building a Crawler, is that you can refresh the data whenever you want so that the data is always up-to-date.

And One More (just for fun)

You can also use this sheet with Connectors that need multiple inputs, such as the one Andrew built to Mandarin Oriental, which need both a “check in” and “check out” date to show the prices of hotels. Now, to find out when the cheapest time to go on holiday is, you can plot the rate change for the entire month.

How does he do it?

Lastly, Andrew gave a everyone quick peek behind the curtain at the Google Apps Script that he wrote to make his spreadsheet.

Everything Andrew showed you is available for you to go and play with and we want to hear from you! If you have any questions or improvements, please feel free to have a play around with it and let us know what you come up with (hello@import.io)!

Join Us Next Time

Next time I’ll once again be joined by the wonderful Kristaps from infogr.am. This time we’ll be showing you how to integrate live data into your infrographic. It’s going to be a cracking show so sign up now!

Comments

Works with normal extractor. Does not work with extractor with authentication. Giving error "TypeError: Cannot call method "push" of undefined."

Hi Ryan, can you send a message to support@import.io referencing this comment. When you do can you provide some information about the source that you are trying to access, what data you want etc. and we can have a go at trying to replicate it for you.

Hi Andrew,

Thank you for your reply. I sent a email to support@import.io under title "Problem with Bulk extract data using import.io and Google Sheets". I also include the link to the excel sheet that I was using.

Thank you,
Ryan

This webinar was just what I needed. Sorry to be a total amateur but is there a way to only return a set number of results when you hit the ‘Get Results’ button? I’d like to be able to return 5 results from the connector and then move onto the next search term from the inputs form. Thanks.

Hi Jimmy,

Interesting question. For now, what I would do is pull back all data and then filter out the rows from each query that are greater than 5. You can tell which rows returned from each query by inspecting the "query" column. Rows are returned in order so the top 5 rows from each query will be what you are after.

If you need more specific help then send an email to support@import.io and we will see what we can do.

v0.3.3 of the Batch Search spreadsheet is available. You can check it out here. http://go.import.io/batchsearch

Is there a maximum extraction or rate limit on it? If so, is there a way to structure it to batch the query?

Hello Andrew,
When I try to click on Get source button I got this error Script function getInputNames could not be found. Kindly help me why this is happening. Thanks in advance

Thanks Andrew for your answer. I copy the sheet again and it worked but I am getting an error

There appears to be a problem accessing Import.io right now for your source information.

Check our status blog and try again – http://blog.import.io/status-blog/

Exception: Request failed for https://api.import.io/store/connector/my user id?user=my user id&apikey=my api key returned code 404. Truncated server response: {"error":"Object not found.","code":"NOSUCHOBJECT"} (use muteHttpExceptions option to examine full response).

I have checked the status blog, it showed they have made some update. So i have opened the support ticket and now waiting for their help.

but anyways thank you for your help and response.

Khadija

Hello Andrew,
Thanks for sharing, this is what I’m looking for.
Well anyway, i have some problems here..
I have this kind of notification, "TypeError: Cannot call method "getRange" of null."
And i dont have an idea what i have to do with this form:
Name:
URL:
Status:
So i left it blank.

Kindly help me why this is happening. Thanks.

Follow these instructions.

  1. Make a copy of the spreadsheet "File > Make a copy"
    2. Fill out your User ID, API Key, and Data Source GUID.
    3. Hit the "Get source" button.
    4. This should fill out the details for: Name, URL, and Status and should list the headings for your data source on row 20.
    5. Put your queries below the headings on row 20 then click "Get Results".

If you run into any problems email support@import.io.

Andrew

Hi Andrew, thanks for your quick respond..
I have followed those instructions, and i got this error:

There appears to be a problem accessing Import.io right now for your source information.

Check our status blog and try again – http://blog.import.io/status-blog/

Exception: Request failed for https://api.import.io/store/connector/772e895b-f40f-4278-9f2d-145a473c266d?user=null&apikey=null returned code 400. Truncated server response: {"error":"Value for parameter user is not a valid UUID: Invalid UUID string: null","code":"BADPARAMETER"} (use muteHttpExceptions option to examine full response)

….

Anyway, i have upgraded from Batch Search V.0.3.3 to V0.3.5
But, i saw something missing, there is only "Data Source GUID:" cell.
Where is "Your User ID:" and "Your API key:" cells?

Thank you.

Hi,
Andrew is currently working to fix that bug you’ve seen (should be done soon). His latest version of the Batch Search sheet should remove the need to input your API key and user GUID – we’ll be putting up an updated blog post with new instructions today.

Hi Jen,

I think its great, i have tried the latest version V0.3.6, and it works like charm 🙂

Thank you for your great support 🙂

Hi Jen,

I think its great, i have tried the latest version V0.3.6, and it works like charm 🙂

Thank you for your great support 🙂

Comments are closed.

Turn the web into data for free

Create your own datasets in minutes, no coding required

Powerful data extraction platform

Point and click interface

Export your data in any format

Unlimited queries and APIs

Sign me up!