Tutorials and educational articles to help you make the most out of your data

deep learning

White Paper: Using Web Data to Power Deep Learning

Written by:

Once the sole purview of academics and a few of the largest high-tech companies, deep learning now represents an approach that’s poised for rapid and widespread growth across a range of companies and industries. Artificial intelligence (AI) was a concept that was introduced in the 1950s. Initially, AI was inherently a rules-based approach. Developers would […]

VR, AI and chatbots are the top technology trends for 2016

Written by:

For venture capital (VC) firms, staying on top of fast-moving technology trends is critical. By leveraging Import.io, Madding King III at Camp One Ventures was able to establish a useful and objective way to track trending buzzwords in the VC sector. Read on to find out what he discovered, and how he did it. VC […]

what is artificial intelligence

What is Artificial Intelligence? Louis Monier explains everything.

Written by:

What is Artificial Intelligence? Our Chief Scientist Louis Monier gives you the straight dope on AI. Artificial Intelligence, always a very polarizing subject, is back on top of the news. Unless you have been on a deep space mission for the past year, you have been exposed to opinions ranging from “this will change everything […]

Great alternatives to every feature you’ll miss from kimono labs

Written by:

If you’re one of the 125k KimonoLabs users who got this message last week… “After almost 2 years of building and growing kimono, we couldn’t be happier to announce that the kimono team is joining Palantir.” …you’re probably wondering what to do next. There’s no denying that kimono was a useful service with some great […]

3 easy ways to get your data into R

Written by:

If you haven’t heard of R before, you should know that it’s one of the most popular statistical programming languages in the world, used by millions of people. It’s open source nature fosters a great community which helps make data analysis accessible to everyone. If you want a better understanding of how R works, and its syntax, we recommend you to take this free Introduction to R tutorial by DataCamp.

While import.io gives you access to millions of data points, R gives you the means to perform powerful analysis on that data and to turn it into beautiful visualizations. It’s a pretty nifty combo!

In this post, you’ll learn 3 easy ways to get your import.io data into R. This is a beginner tutorial so don’t worry if you’re not that familiar with R or import.io’s advanced features.

Let’s get started!

How to crawl a website the right way

Written by:

The word “crawling” has become synonymous with any way of getting data from the web programmatically. But true crawling is actually a very specific method of finding URLs, and the term has become somewhat confusing.

Before we go into too much detail, let me just say that this post assumes that the reason you want to crawl a website is to get data from it and that you are not technical enough to code your own crawler from scratch (or you’re looking for a better way). If one (or both) of those things are true, then read on friend!

In order to get data from a website programmatically, you need a program that can take a URL as an input, read through the underlying code and extract the data into either a spreadsheet, JSON feed or other structured data format you can use. These programs – which can be written in almost any language – are generally referred to as web scrapers, but we prefer to call them Extractors (it just sounds friendlier).

A crawler, on the other hand, is one way of generating a list of URLs you then feed through your Extractor. But, they’re not always the best way.

How to get live web data into a spreadsheet without ever leaving Google Sheets

Written by:

We are super excited to announce that Blockspring have just released an integration with us to let you automatically pull data from the web into a spreadsheet. They’ve created an awesome solution that lets you use a range of great APIs using only a spreadsheet. Their Excel and Google Sheets plugin enables you to bring data into your spreadsheet, run text-analysis and much more.

So today we want to show you how you can use live web data from a spreadsheet to do a bunch of cool things in just a few minutes. Currently Blockspring is using our Magic API, which automatically generates a table of data from a URL. You just have to provide the Blockspring integration a URL and it pulls the data from that site into a nice, orderly table – all without leaving your spreadsheet.

How to choose the right visualization for your data

Written by:

In my job at Silk.co, I help lots of journalists, workers at NGOs and marketers build data visualizations from spreadsheets. Often we use Import.io to extract data in from the public Internet to push into a spreadsheet which we then upload into Silk for further analysis and visualization. (Here’s one we did with Import.io about Uber Jobs which was picked up in Mashable). I do spend considerable time thinking how to best represent data with visualizations. I am by no means an expert in data visualization on the level of Alberto Cairo or Edward Tufte.

That said, I do have some basic visualization guidelines that I use. These guidelines enable anyone quickly match the goal of their data visualization to the visualization type (or types) that should work best for their data.

How to fix a Publish Request Failure

Written by:

Ever get to the end of building your API and see the dreaded Publish Request Failure (PRF) message? We feel your pain! There are a couple of reasons for API publish request failures, but the most common by far is timeouts due to JavaScript.

Our new API docs include some exciting beta features to help get around sites with heavy JavaScript. But, we can’t let devs have all the fun – so if you don’t know your GET from your POST, here’s a quick hack to get all the same functionality without needing to know any code.

The fastest way to get data from the web

Written by:

By now – if you’ve been following import.io at all – you’ve probably tried Magic. Our one-click data extraction technique that we blew you away with last fall. Well, hold on to your hats because we’re about to do it again. Magic, just got more magical. And faster. Was that even possible?!!?!

Why isn’t my Crawler working?

Written by:

Hey everyone, Alex here again. For this week’s webinar I chose the ever popular topic of…Crawlers. This is always a really popular one because a Crawler is the easiest way to get lots and lots of data very quickly. But instead of showing you how to build a Crawler (you can watch that webinar here), I want to talk about some of the most common Crawler issues and the how you can solve them.

Quick solutions to two common data extraction problems

Written by:

In this week’s webinar, Alex gives you an insight into how to solve two of the most common issues from our support channel: un-clickable data and disappearing Connector actions. These tips may seem advanced, but they’re actually quite easy to do once you know what’s going on.

Advanced Crawling features and XPaths

Written by:

This is a recap of our most recent webinar where we looked at advanced crawling techniques using import.io. Follow us down the garden XPath as we check out some features for confident users looking to get the most out of their crawlers.

This webinar is all about our advanced features. If you’re new to import, I recommend you watch this Getting Started webinar first, because we’ll be skipping some of the basics to get down into the real meat of what import can do. Advanced crawling, XPaths, URL templates – this webinar’s got all that and more.

Become a data extraction master

Written by:

At import.io we have a lot of different options and tools for getting data from the web. And navigating them can sometimes be a bit tricky. In this webinar, I take you on a comprehensive journey of all that import has to offer. From the simple pasting of a URL to the ultra-powerful automating actions, by the time you’re done watching this video you will be a data extracting master!

Build a word cloud in 30 seconds

Written by:

This morning a stumbled across Tagul, an online word cloud creation tool which has a couple of really cool features. The first, is you can upload your own image for it to put your words in (I obviously immediately uploaded our pink owl – Owen). The second is that you can put in a URL for it to pull the words off of. 

Updated: Bulk extract data using Google Sheets

Written by:

Those regular blog readers among you will remember my previous batch search webinar in which I showed you how to use the Google Sheet I created to upload a lot of URLs (or search terms) into an extractor (or connector). It was an extremely popular post and I got lots of comments and questions about it. Well, there is now….a NEW version!

I’ve updated the spreadsheet to include some of the feedback I received and generally improve the sheet’s performance. So, let me introduce you to Batch Search v0.3.5…

5 questions to ask when crawling

Written by:

Crawling can be a bit of a mystery if you’re not familiar with the principle. I’ve done a few (rather long) full “How to Crawl” tutorials, but to keep things simple; here are the 5 questions you should ask yourself while building a crawler…

Do I need to crawl in the first place?

Is the data you need dispersed across more than 10 pages? If not, then the crawler tool probably isn’t the one for you. In that case, it would be far more efficient to use our extractor tool and simply add the URLs you need to get the data from with in your dataset page.

A direct line to Plot.ly

Written by:

I’ll be the first to admit that data sets aren’t exactly the most exciting things to look at. They’re great for running analysis on, but spotting trends and patterns can be pretty difficult when everything is just lined up in rows and columns. Which is why data visualization tools are so important! Now, we’re not trying to reinvent the wheel over here at import (we’ll probably never make a viz tool), but we realize that getting the data is only half the story. So, we’ve set up a direct line to the guys over at Plot.ly (a web-based graphing platform), to let you send your data straight into an awesome viz. In this webinar, I’m going to give you an overview of how it works…

An import.io webinar: Christmas special

Written by:

This latest webinar is another classic, but always popular, Getting Started. But instead of crawling for jeans on Asos (you regulars will know what I’m on about), I thought I’d take inspiration from the impending holiday season and make it a bit more festive!

Nothing says Christmas to me more than jumpers and delicious food, so that’s what I’ll be focusing on…apologies in advance for making you hungry :-). 

A bit too much madness…

Written by:

So this weeks webinar didn’t exactly go as planned….I guess the internet couldn’t handle that much awesome and it decided to cut out half way through the webinar. Which is a real shame, cause it was shaping up to be one for the record books! Anyway, I apologies for the outage – rest assured our internet providers got a right talking to – it’s one of the hazards of having a hipster London office. 

I know you guys were loving the Google Sheets madness (brought to you by the man, the myth, the legend: Andrew Fogg) and while I wasn’t able to salvage the recording, I have written down a brief outline of all the things we talked about (and were planning to talk about) along with links to relevant tutorials. 

Ask Alex (anything)

Written by:

I get a lot of great questions through support and on the webinars – and instead of burying them at the end of my webinar recaps, I thought I should give them their due time in the spotlight – I think they deserve it, don’t you? So, without further ado; I’d like to introduce you to my  brand new feature – Ask Alex! Each week I’ll take the most asked questions and share the answers with you, my adoring public. 

These are meant to be interactive, so if you have any questions (even if they aren’t data related) please email them to me with the heading “Dear Alex” – I just ask that you keep your questions relatively generic so that they are applicable to everyone. 

Crawling Nemo – an import.io webinar

Written by:

At the end of my last webinar I asked you guys to tell me what topic you were dying to hear about. Quite a few of you wrote in, and after careful analysis…it was clear that you wanted to hear more about Crawlers! More specifically, you said you wanted a more in-depth look at some of our more advanced features.

Well, I’m nothing if not a people pleaser, so I set to work and came up with a webinar I think you guys will love! If you’re not familiar with crawlers, don’t worry, you can watch this Crawling 101 webinar I did a little while ago, which should tell you all you need to know

4 invisible data points every API should include

Written by:

Q. What do a website and an iceberg have in common?

A. There is titanic amount of important stuff lurking just below the surface of both.

Web Designers and Developers will know what I mean. Underneath the pretty facade of a website’s interface, are thousands of lines of markup – an absolute gold mine for us data lovers.

…and import.io of course are here to show you how to access it.

A magical new webinar

Written by:

It’s been two weeks since our last webinar – due to traveling for the Data Summit – and there’s loads to catch up on. For this week’s edition, I decided to give you guys an in-depth look at our newest tool, Magic, which we launched at the Summit on oct 30th. Magic lets you extract data direct from your browser (or tablet/mobile) automatically – with no training or downloading or any of those pesky rows and columns. It’s a really great showcase of what our algorithms team has been working on, not to mention it’s really cool.

A Silk-y smooth data visualization webinar

Written by:

We had another awesome joint webinar this week. Alex Salkever from Silk.co was here at the import.io webinar studios to help me show you how to create some stellar vizzes with your data. In less than an hour, we crawled Kickstarter and made a bubble map showing the difference in concentration of Technology vs. Journalism startups across the World.

Everything you ever wanted to know about Crawlers in one webinar!

Written by:

I do these webinars not because I like talking about my beard – although I certainly do – but because I love nothing more than helping you guys learn to get data from the web. So when I get something through on the request line, I always do my best to accommodate it! I got quite a few questions about Crawlers last time, so I decided to make this webinar is all about them. I’m going to take you through what one is, how to build one, a few advanced features and then answer some FAQs. But first, a poem from a user! 

Live data is beautiful too!

Written by:

A couple of weeks ago Kristaps, from infogr.am, and I got together to do our first joint webinar on using web data to build an infographic. It got such a great response, that we thought “why not do it again?”. So, back by popular demand, this week I am once again joined by the fantastic Mr. Kristaps to show you how you can get data from the web and turn it into a cool infographic.

Now, we wouldn’t just want to repeat ourselves – that wouldn’t be much fun – so this time we’re doing it with a little twist. One of the key benefits of using import.io, is that we pull data from websites in real-time. This means that you can build an infographic with data that changes regularly and have it always be up-to-date.

Bulk extract data using import.io and Google Sheets

Written by:

Today’s webinar was brought to you by a very special guest, none other than Co-Founder and Product Evangelist, Andrew Fogg! It’s rare that I willingly relinquish the webinar spotlight, but when I heard what he had come up with, I just knew I had to let him tell you all about it.

Webinar: Getting started (again)

Written by:

For today’s webinar I am joined by my favorite Northerner and technical marketing expert, Dan Cave. In light of all the new signups we got after our last joint webinar with infogr.am, we thought it’d be a good idea to give you guys a look at all the different data extraction tools we have to help you get data from the web.

Tips and tricks for using import.io

Written by:

My original plan for this webinar was to look at voice activation and some of the hacks that we made a few months ago. Unfortunately, due to a few technical difficulties, I wasn’t able to do this. But, bing the inventive guy I am I decided to wing it and show you more interesting tips and tricks you can use to pull data using our tool.

Extract live pricing data from the web

Written by:

For this webinar, Chris and I pretend to be the owner of a clothes shop – we’ll call it Chris and Alex Inc – C&A for short! Because Chris and I are savvy business men, we know that we need to compare the prices of what our competitors are selling their products at so we can be competitive. Traditionally, we’d have to get someone to manually go through the whole website and note down (on paper?) the price and item name leaving us with 1000s of piece of paper and some poor person in admin sorting it all out – not cool.

Crawl sites with infinite scrolling

Written by:

Getting data from sites with an infinite scroll can be somewhat challenging, so I’ve created this guide to help you out. It’s really easy and you don’t need to be an amazing coder to find out how, just a detective and an excel wizard.

Saving searches to sort your sources

Written by:

All of your import.io data sources are listed on your My Data page for easy access. But, once you have a lot of them, it can be tricky to scroll through and find the sources you need quickly.

To help, we give you the ability to save searches for data sources so that you can list them more easily. Say you have a lot of data sources (like me) and just want to find the extractors to Amazon. 

How to build a web app with data

Written by:

Hello again lovely people. For this week’s webinar I mostly handed the reins over to our resident Developer Experience Engineer, Chris A, who showed you how to build an App with data!

What is an app?

So, first of all, why do you want to build an app? Well, generally it’s because you want to solve a problem. In Chris’ case he wanted to build his own computer and he needed computer parts. He can buy these parts online from a number of different sites, but there is no good way to compare all of them. So, Chris decided to build an app that would let him compare products from all the sites at once.

Lean data journalism in 30 minutes

Written by:

For this webinar I took a bit of a back seat and passed the spotlight over to our Data Intelligence guru, Bea Schofield. Some of you may recognize her as our resident data journalism expert, and I thought this would be a great opportunity to have her show you how it’s done.

How to do Data Journalism

There are generally two approaches to Data Journalism. The first, is to pull live or recent data, usually to build infographics and visualizations to help people understand your story. The second, is to monitor data over time to identify interesting patterns and trends.

More tips & tricks for extracting data from the web

Written by:

We went old school for this week’s webinar, bringing back the usual suspects: myself (Alex) and our Developer Experience Engineer, Chris A. Since the last Tips & Tricks webinar was so popular (sold out in fact), we thought we’d do it again – this time with a moustache! These webinars are all about you, so our main aim for this one was to answer as many of your questions as possible.

If you have more questions as you’re using the tool you can always click on the little pink question mark on our site or in the app and just type your question in the box. It’ll search our knowledgebase for you and you if you can’t find what you’re looking for you can submit your question to me and the rest of the support team.

Let’s mix it up

Written by:

I’m back again! This time I had our wonderful Data Scientist Ignacio on hand to help me team you all about making a Mix

First off, what is a Mix?

Mix is where you make a number of Connectors, with the same schema, which you can then combine and query with one input.

The best way to understand this is to try an example. Let’s say we work for one of the major UK supermarkets and we want to keep track of what our competitors are charging week to week. Instead of visiting each site individually, you can build a Connector to each, then combine them into a Mix and search all of them at once!