Tutorials and educational articles to help you make the most out of your data

deep learning

White Paper: Using Web Data to Power Deep Learning

Written by:

Once the sole purview of academics and a few of the largest high-tech companies, deep learning now represents an approach that’s poised for rapid and widespread growth across a range of companies and industries. Artificial intelligence (AI) was a concept that was introduced in the 1950s. Initially, AI was inherently a rules-based approach. Developers would […]

VR, AI and chatbots are the top technology trends for 2016

Written by:

For venture capital (VC) firms, staying on top of fast-moving technology trends is critical. By leveraging Import.io, Madding King III at Camp One Ventures was able to establish a useful and objective way to track trending buzzwords in the VC sector. Read on to find out what he discovered, and how he did it. VC […]

what is artificial intelligence

What is Artificial Intelligence? Louis Monier explains everything.

Written by:

What is Artificial Intelligence? Our Chief Scientist Louis Monier gives you the straight dope on AI. Artificial Intelligence, always a very polarizing subject, is back on top of the news. Unless you have been on a deep space mission for the past year, you have been exposed to opinions ranging from “this will change everything […]

Great alternatives to every feature you’ll miss from kimono labs

Written by:

If you’re one of the 125k KimonoLabs users who got this message last week… “After almost 2 years of building and growing kimono, we couldn’t be happier to announce that the kimono team is joining Palantir.” …you’re probably wondering what to do next. There’s no denying that kimono was a useful service with some great […]

3 easy ways to get your data into R

Written by:

If you haven’t heard of R before, you should know that it’s one of the most popular statistical programming languages in the world, used by millions of people. It’s open source nature fosters a great community which helps make data analysis accessible to everyone. If you want a better understanding of how R works, and its syntax, we recommend you to take this free Introduction to R tutorial by DataCamp.

While import.io gives you access to millions of data points, R gives you the means to perform powerful analysis on that data and to turn it into beautiful visualizations. It’s a pretty nifty combo!

In this post, you’ll learn 3 easy ways to get your import.io data into R. This is a beginner tutorial so don’t worry if you’re not that familiar with R or import.io’s advanced features.

Let’s get started!

How to crawl a website the right way

Written by:

The word “crawling” has become synonymous with any way of getting data from the web programmatically. But true crawling is actually a very specific method of finding URLs, and the term has become somewhat confusing.

Before we go into too much detail, let me just say that this post assumes that the reason you want to crawl a website is to get data from it and that you are not technical enough to code your own crawler from scratch (or you’re looking for a better way). If one (or both) of those things are true, then read on friend!

In order to get data from a website programmatically, you need a program that can take a URL as an input, read through the underlying code and extract the data into either a spreadsheet, JSON feed or other structured data format you can use. These programs – which can be written in almost any language – are generally referred to as web scrapers, but we prefer to call them Extractors (it just sounds friendlier).

A crawler, on the other hand, is one way of generating a list of URLs you then feed through your Extractor. But, they’re not always the best way.

How to get live web data into a spreadsheet without ever leaving Google Sheets

Written by:

We are super excited to announce that Blockspring have just released an integration with us to let you automatically pull data from the web into a spreadsheet. They’ve created an awesome solution that lets you use a range of great APIs using only a spreadsheet. Their Excel and Google Sheets plugin enables you to bring data into your spreadsheet, run text-analysis and much more.

So today we want to show you how you can use live web data from a spreadsheet to do a bunch of cool things in just a few minutes. Currently Blockspring is using our Magic API, which automatically generates a table of data from a URL. You just have to provide the Blockspring integration a URL and it pulls the data from that site into a nice, orderly table – all without leaving your spreadsheet.

How to choose the right visualization for your data

Written by:

In my job at Silk.co, I help lots of journalists, workers at NGOs and marketers build data visualizations from spreadsheets. Often we use Import.io to extract data in from the public Internet to push into a spreadsheet which we then upload into Silk for further analysis and visualization. (Here’s one we did with Import.io about Uber Jobs which was picked up in Mashable). I do spend considerable time thinking how to best represent data with visualizations. I am by no means an expert in data visualization on the level of Alberto Cairo or Edward Tufte.

That said, I do have some basic visualization guidelines that I use. These guidelines enable anyone quickly match the goal of their data visualization to the visualization type (or types) that should work best for their data.

How to fix a Publish Request Failure

Written by:

Ever get to the end of building your API and see the dreaded Publish Request Failure (PRF) message? We feel your pain! There are a couple of reasons for API publish request failures, but the most common by far is timeouts due to JavaScript.

Our new API docs include some exciting beta features to help get around sites with heavy JavaScript. But, we can’t let devs have all the fun – so if you don’t know your GET from your POST, here’s a quick hack to get all the same functionality without needing to know any code.

Advanced Crawling features and XPaths

Written by:

This is a recap of our most recent webinar where we looked at advanced crawling techniques using import.io. Follow us down the garden XPath as we check out some features for confident users looking to get the most out of their crawlers.

This webinar is all about our advanced features. If you’re new to import, I recommend you watch this Getting Started webinar first, because we’ll be skipping some of the basics to get down into the real meat of what import can do. Advanced crawling, XPaths, URL templates – this webinar’s got all that and more.

Build a word cloud in 30 seconds

Written by:

This morning a stumbled across Tagul, an online word cloud creation tool which has a couple of really cool features. The first, is you can upload your own image for it to put your words in (I obviously immediately uploaded our pink owl – Owen). The second is that you can put in a URL for it to pull the words off of. 

Updated: Bulk extract data using Google Sheets

Written by:

Those regular blog readers among you will remember my previous batch search webinar in which I showed you how to use the Google Sheet I created to upload a lot of URLs (or search terms) into an extractor (or connector). It was an extremely popular post and I got lots of comments and questions about it. Well, there is now….a NEW version!

I’ve updated the spreadsheet to include some of the feedback I received and generally improve the sheet’s performance. So, let me introduce you to Batch Search v0.3.5…

A direct line to Plot.ly

Written by:

I’ll be the first to admit that data sets aren’t exactly the most exciting things to look at. They’re great for running analysis on, but spotting trends and patterns can be pretty difficult when everything is just lined up in rows and columns. Which is why data visualization tools are so important! Now, we’re not trying to reinvent the wheel over here at import (we’ll probably never make a viz tool), but we realize that getting the data is only half the story. So, we’ve set up a direct line to the guys over at Plot.ly (a web-based graphing platform), to let you send your data straight into an awesome viz. In this webinar, I’m going to give you an overview of how it works…

4 invisible data points every API should include

Written by:

Q. What do a website and an iceberg have in common?

A. There is titanic amount of important stuff lurking just below the surface of both.

Web Designers and Developers will know what I mean. Underneath the pretty facade of a website’s interface, are thousands of lines of markup – an absolute gold mine for us data lovers.

…and import.io of course are here to show you how to access it.

Everything you ever wanted to know about Crawlers in one webinar!

Written by:

I do these webinars not because I like talking about my beard – although I certainly do – but because I love nothing more than helping you guys learn to get data from the web. So when I get something through on the request line, I always do my best to accommodate it! I got quite a few questions about Crawlers last time, so I decided to make this webinar is all about them. I’m going to take you through what one is, how to build one, a few advanced features and then answer some FAQs. But first, a poem from a user! 

Live data is beautiful too!

Written by:

A couple of weeks ago Kristaps, from infogr.am, and I got together to do our first joint webinar on using web data to build an infographic. It got such a great response, that we thought “why not do it again?”. So, back by popular demand, this week I am once again joined by the fantastic Mr. Kristaps to show you how you can get data from the web and turn it into a cool infographic.

Now, we wouldn’t just want to repeat ourselves – that wouldn’t be much fun – so this time we’re doing it with a little twist. One of the key benefits of using import.io, is that we pull data from websites in real-time. This means that you can build an infographic with data that changes regularly and have it always be up-to-date.

Visualizing data with Tableau

Written by:

Data visulization by import·io user Michael Carper Visualizing Data   One of the coolest parts about working at import is seeing what our users can do with the data they extract using our platform. This week we got two very different, but equally cool, data visulizations from two users who plugged their data into Tableau. […]

Get Data the Sane Way!

Written by:

There is a lot of great data available on the web, but getting to that data is hard. Scraping is an endless chore and hard to maintain at scale. Import.io provides tools to solve these problems. Learn how our extractors can turn a page’s URL into structured data returned over a JSON API and how […]