Download data over the API

You can download the data saved for Crawlers and Datasets programmatically using the API. You can see this process in full by watching our webinar video, but here is a step-by-step example complete with API calls.

For this example I will use this Amazon crawler I built during the webinar.

In order to get the data over the API, you need the data source’s GUID, which is available on your My Data page. In this case it is “83b8ad35-5a80-4889-80ec-d5718627e77e”.

Next, you need to get the data that you see on this web page over the API. To do this, simply add that GUID to the end of this URL (http://api.import.io/store/connector) and paste it into your browser.

For all of these examples, you will need to authenticate. In order to do this, get your User GUID and API key from your account page. Then, URL-encode your API key. Finally, append them to the example URLs in this format:

?_user=YOUR_USER_GUID&_apikey=ENCODED_API_KEY

My URL to my data source looks like this: https://api.import.io/store/connector/83b8ad35-5a80-4889-80ec-d5718627e77e (don’t forget to include your authentication! See above)

This is the data that comes back from the API:

From the response, check the “snapshot” field. The snapshot GUID identifies the current version of the data that has been saved for the Crawler or Dataset. When you find that GUID, add it to your first URL with /_attachment/snapshot/GUID. 

Like so: https://api.import.io/store/connector/83b8ad35-5a80-4889-80ec-d5718627e77e/_attachment/snapshot/21c12f72-c7eb-47cf-9efd-61026efc8196

This URL will provide you with all of the Crawler or Dataset’s data as a JSON file.

Here is an example of how that data looks:

Whenever you update the data for a Crawler or Dataset (by saving them), don’t forget to make both requests, as the second GUID will change each time it’s saved.

You can see a list of all of the GUIDs for different versions by using the history API, with a URL that looks like this:

https://api.import.io/store/connector/83b8ad35-5a80-4889-80ec-d5718627e77e/_attachment/snapshot/_history

Here is an example response from this request:

Each of the “_id” fields in the “hits” array is a GUID that you can use as the second argument, which downloads that version of the Crawler or Dataset’s data.

As always, if you have any questions then don’t forget to email us at support@import.io or visit our support site and we will be more than happy to help out!

Turn the web into data for free

Create your own datasets in minutes, no coding required

Powerful data extraction platform

Point and click interface

Export your data in any format

Unlimited queries and APIs

Sign me up!