Technical

Insight and technical features for the code savvy

Artificial Intelligence Regulation

Artificial Intelligence Regulation: Let’s not regulate mathematics!

Written by:

On Wednesday, ahead of today’s White House Frontiers Conference, the White House Office of Science and Technology Policy released its report on Preparing for the Future of Artificial Intelligence. The report is optimistic, comprehensive and well-balanced. In summary: full-speed ahead.  But let’s be smart when it comes to Artificial Intelligence regulation. The premise is that […]

5 New Advanced Data Extraction Features To Try Out

Written by:

You asked for more powerful data extraction features. We took your feedback and set our Engineers loose on the challenge. We are excited to announce 5 brand new advanced data extraction features that will help you get data out of more websites: Disable CSS Default Column Values Advanced Regex Support Require Column Values Raw HTML […]

Query API – Service Availability

Written by:

Last updated: 2pm PST, Wednesday 9th March 2016 The new Query API has been in production for 24 hours now at 99.9% availability.   We will continue to monitor this situation but you can assume that normal service has been resumed.  Any questions please reach out to supprt@import.io First published: 9am PST, Tuesday 8th March 2016 The Import.io […]

Neural nets: How Regular Expressions brought about Deep Learning

Written by:

2016 is the year of Deep Learning, whose popularity has been on a steep incline ever since Google bought DeepMind at the end of 2014. Last year’s technical breakthroughs,  acquisitions, funding deals and, open source releases have all helped to cement Deep Learning as the hip artificial intelligence.  Our CTO, Matt Painter, explains how Deep Learning got its […]

Using AWS Lambda and API Gateway to create a serverless schedule

Written by:

At import.io, we believe that the best DevOps is NoOps.

That doesn’t mean that we don’t like DevOps people. On the contrary, we think they rock!

But ideally, we want everyone to be able to automate their work wherever possible. Instead of spending time on repetitive and mundane tasks, we push those jobs onto computers – leaving ourselves more time to work on great features.

With that in mind, we are starting to adopt microservices patterns as we scale our engineering efforts, so that new features are now delivered as separate components from the main platform. This allows us to iterate on new functionalities quickly, test different technology stacks and involve people who do not have the inside knowledge about the platform in the development process.

Our latest project, Scheduled APIs (which will let you run your Import.io APIs on a schedule), gave us a chance to revise our technological stack, and have a look around at new paths for building long lasting components. Specifically, we used a set of new AWS solutions: Amazon Lambda and API Gateway.

Using Kinesis and Kibana to get insights from your data

Written by:

Matthew Painter, CTO at import.io, walks you through how we use Amazon Kinesis for managing our routing of event data, such as queries being made on the platform, and how to allow your product and user teams to analyze the events in the fantastic Kibana 4, a “flexible analytics and visualization platform” that is powered by Elasticsearch.

Create a live data viz with SAP Lumira

Written by:

One of the greatest thing about the import.io community is that they inspire each other to try new things and share fun projects. Recently, Shankar Narayanan SGS became inspired by Ronald Konijenburg’s post about getting data from the World Series of Poker and visualizing it in SAP Lumira. In it, Ron used import.io to get data from the WSOP website and visulaiz it using SAP Lumira. But he did it with a CSV, which meant that the data was static. 

XPath 101: 3 common XPaths for extracting data

Written by:

XPaths are what make the import.io app go ‘round. Without them, we wouldn’t be able to extract data, so needless to say, they’re pretty important. The reason XPaths are so important, is because when you click on data to train the tool, behind the scenes our algorithms are trying to work out the corresponding XPath for that data. These are the three most common uses of Xpaths in import.io; but first, what is an XPath anyway? 

importSQL: The no hassle way to send your data direct to SQL

Written by:

Every Friday at import.io we have a company-wide meeting where we get together and listen to one of the team talk about something data or company related that they’ve been working on. If we’re not convinced, we fire them. After that, we have cake.

Just kidding… there’s no cake.

Last week’s Friday talk revealed that a lot of our users really want a better way to put data into databases like SQL.

Occasionally we developers get a little time to ourselves to hack together some side projects or proof of concepts that could later get turned into features. Inspired by what I learned on Friday, I thought it would be a fun idea to come up with an easier way to send data from a website straight into an SQL database with zero hassle.

Create a D3 visualization

Written by:

Create a D3 Visualization The Co-founder of Vida Lab used data to create this fascinating D3 map that shows obesity rates across the US. It turns out Colorado is the skinniest state, who knew? vida.io

Combine import.io with SAP Lumira and SAP HANA for business analytics

Written by:

We always love to hear what people using import.io get up to with the data they extract with import.io data sources – there are more use cases than we could ever imagine, and we are constantly seeing beautiful, useful and innovative work based on our platform.

Recently we were contacted by Ronald who wanted to show us two integrations he has done using the import.io APIs.

Getting data from local files

Written by:

import.io is pretty good at dealing with web pages, but what if you want to structure data in a file that is on your local machine?

One way would be to upload your file to a web server somewhere, but not everyone has that capability, or the data may be sensitive so shouldn’t be uploaded online.

import.io + Open Refine + Google Fusion Tables = Magic!

Written by:

The University of Ottawa Library holds an employee training week every year, giving colleagues the opportunity to share experiences, skills, and insights with one another. I jumped on this opportunity to showcase import.io as a means of creating datasets from website content. The tutorial I developed demonstrated how to create a dataset from the City of Ottawa’s open data catalogue. It’s a really simple example to get users familiar with the functionality of import.io, an easy way to scrape web content via a simple interface and without having to code. In this post I will also demo how to use Open Refine to clean the data captured by import.io and how to visualize it using Google Fusion Tables.

Using import.io as a database feed

Written by:

Martin Hawksey described how you could populate a Google Spreadsheet with regular updates from data that had been prepared by some scraping process in import.io.

Getting it into a spreadsheet is a good start, but what if you want to get the data into some other kind of format? A fusion table, an Orchestrate database or some other medium for example. Let’s say you are creating a mobile application that accesses parse.com for its data, and for this application we need some data processed by import.io, perhaps merged with some data from some other source. Here’s how we can use Google Apps Script to achieve that.

Integrating import.io with Google Apps Script

Written by:

One of the refreshing things about import.io is the service is accessible to users with a wide range of expertise from seasoned coders to those with no coding experience at all. From point and click data extraction, to basic integration using Google Sheets, within minutes you can be creating custom workflows and automations. One of the nice things about getting the data in a spreadsheet, is that users can quickly and easily graph, filter or even manipulate their data further with a selection of formula. There are, however, issues with spreadsheets.

Update: Client library compatibility with Ruby 2.1

Written by:

We have just shipped a patch to our Ruby client library, providing support for Ruby 2.1.

Previously we required Ruby version 1.9+ for the client library, but the 2.1 branch introduced Ruby Bug #9718, in which cloning or duplicating Queues caused Ruby to segfault. Ruby versions 1.9 and 2.0 were unaffected and continue to operate effectively.

How to use our advanced features

Written by:

I took a back seat on the webinar this week and left you in the very capable hands of our developer duo Chris A and Chris B (Bamford) who showed you some of our more advanced features. Now you may think that you need to a developer to use these features, but I’m here to tell you (as a non-dev myself) that actually the concepts are pretty simple. With just a little bit of extra knowledge you can make our tool do some pretty crazy things. And if you get stuck you can always shoot me an email at support@import.io and I’ll help you figure it out!

Ok, enough of that… Let’s get this show on the road!

DevNetwork releases infographic naming import.io as one of the leading companies in the developer technology landscape

Written by:

We are thrilled that our technology was recognized in DevNetwork’sLeadership Map of Developer Technology infographic, as a leader in API Services!  DevNetwork, producers of APIWorld,DataWeek, and DeveloperWeek, have organized over 120 companies into 9 core groups of technologies across 34 categories in this version one edition of the developer technology industry.  The infographic aims at making developer technology understandable to the entire business / technology community. 

The “DevOps” makes the developer

Written by:

This morning I read a very interesting article by Jeff Knupp entitled “How DevOps is Killing the Developer”. In it, Jeff argues that developers are at the top of the technical expertise chain (having the most niche, specific expertise), and as such shouldn’t be distracted spending time doing “DevOps” (or indeed QA, sysadmin, database admin, and so on), which he claims require less specific expertise.

Implementing AWS at scale

Written by:

On Tuesday, our Developer Experience Engineer, Chris Alexander, took an in-depth look at services available on the AWS cloud, so I made sure to tag along to watch the big show. The event was hosted at Skills Matter alongside the meetup group LJC: London Java Community. The audience of around 100 people were treated to an evening of: Details of using AWS to scale, and tips and tricks based around infrastructure and platform as a service. #LoveLearning.

Building features in pods

Written by:

The concept of breaking companies and teams into smaller groups in order to progress with units of work relatively independently is by no means a new concept. Even with its recent growth in popularity as a way of scaling teams, Amazon have done it for years, and 3M have been at it for decades.

Nowadays, flat-hierarchy organisation structures are all the rage, especially in the Valley. The concept is appealing for a number of reasons: the right people on the right jobs, the promise of easy scaling of the business as it grows, and work happening closer to the customer to name but a few.

Resolving merge conflicts

Written by:

As soon as you begin working on a software project that exceeds a few lines of code, managing changes rapidly becomes trickier. To counter this, developers use version control systems to track changes to folders and files.

These systems typically work by tracking changes to files as individual “commits” of work, which then allow you to build up a file’s current state by looking at its initial state, then the series of changes (or commits) that have affected it since then.

Sync Up! Keeping AWS Servers Inside a VPC on the Clock

Written by:

Keeping Your Server Clocks Synced EC2 servers are started in sync with the VM host’s clock. They are also configured to use the OS’s standard time servers – for Amazon Linux AMIs, these are *.amazon.pool.ntp.org, and for Ubuntu it’s *.ubuntu.pool.ntp.org. This is great if your server is on “classic” EC2 so it can access the […]

Are PHP web crawlers dead?

Written by:

These days there are so many better alternatives to wasting hours (sometimes days) coding PHP web crawlers. With today’s technology you can build a web crawler in minutes using a simple visual interface and a tool that learns what you want. If you can see it on the page, you can have it!

Rich Data Formats

The web crawler in import.io is highly targeted so you only get the data you need from the pages you are interested in and allows you to extract data in a variety of formats.

Analysing ELB logs with Logstash and Kibana

Written by:

Last week, Amazon launched the ability to turn on per-request logging in Elastic Load Balancers (ELBs). This is a much sought-after feature that many users have been asking about for some time now, and finally it is here.

However, logging is only half of the battle. Once you have the logs you have to do something with them in order to be able to figure out if there are any problems that need addressing or improvements that can be made.

Introduction to Asynchronous APIs in Python

Written by:

Our current Python client library is quite complex and can be tricky to get your head around; the good news is that soon we will be bringing out version 2, which is much easier to use and comes with much more help content.

While we are putting the final touches on the new version, I want to take a few moments to talk about some of the core concepts of the client library.

Could OpenID Connect solve federated sign-in problems?

Written by:

Last week I took to our tech blog to outline how we went about implementation for our just-launched Social Sign-in feature. In it, I spoke about a number of implementation issues that came about with the use of OAuth 2. For example:

“…once you have the OAuth 2 access token, all of the [federated auth] services offer completely different APIs for getting hold of data about the user. For this, in the end, we resorted to writing custom implementations for each provider.”

Post-mortem on Recent API issues

Written by:

Recently, we have encountered a number of occasions where there were issues on the website and APIs (and at one point, some significant downtime), as a result of clustering issues on our API servers. Firstly, we would like to apologise to everyone who was affected by these issues. We would like to take this opportunity to explain what the issue was, what we have done to fix it, and what future work we will be undertaking in order to improve the stability of the platform.

Building Social Sign-in

Written by:

Recently, as part of our Nexus release, we launched “Social Sign-in”; a feature that allows anyone to sign up for or log in to an import·io account using one of four external social services (LinkedIn, Facebook, Google+, Github).

When we first proposed the feature, one of our team – now rather infamously – said, “Why can’t you just put a Facebook button on it?” In this day and age, with social accounts being used for authentication all over the web, it can seem trivial to implement. And actually, in some cases, it can be trivial – but to do it properly takes some work. So, I wanted to take a few minutes to outline how we went about the implementation, and why we think this provides a superior user experience.

APIs as Art

Written by:

Yesterday, Google launched a rather unusual new project called DevArt. In conjunction with the Barbican, Google is leveraging the creativity of developers to build some fantastic art, which will be exhibited at the Barbican along with the work of some commissioned artists in their Digital Revolution exhibition.