Insight and technical features for the code savvy

Artificial Intelligence Regulation

Artificial Intelligence Regulation: Let’s not regulate mathematics!

Written by:

On Wednesday, ahead of today’s White House Frontiers Conference, the White House Office of Science and Technology Policy released its report on Preparing for the Future of Artificial Intelligence. The report is optimistic, comprehensive and well-balanced. In summary: full-speed ahead.  But let’s be smart when it comes to Artificial Intelligence regulation. The premise is that […]

5 New Advanced Data Extraction Features To Try Out

Written by:

You asked for more powerful data extraction features. We took your feedback and set our Engineers loose on the challenge. We are excited to announce 5 brand new advanced data extraction features that will help you get data out of more websites: Disable CSS Default Column Values Advanced Regex Support Require Column Values Raw HTML […]

Query API – Service Availability

Written by:

Last updated: 2pm PST, Wednesday 9th March 2016 The new Query API has been in production for 24 hours now at 99.9% availability.   We will continue to monitor this situation but you can assume that normal service has been resumed.  Any questions please reach out to First published: 9am PST, Tuesday 8th March 2016 The […]

Neural nets: How Regular Expressions brought about Deep Learning

Written by:

2016 is the year of Deep Learning, whose popularity has been on a steep incline ever since Google bought DeepMind at the end of 2014. Last year’s technical breakthroughs,  acquisitions, funding deals and, open source releases have all helped to cement Deep Learning as the hip artificial intelligence.  Our CTO, Matt Painter, explains how Deep Learning got its […]

Using AWS Lambda and API Gateway to create a serverless schedule

Written by:

At, we believe that the best DevOps is NoOps.

That doesn’t mean that we don’t like DevOps people. On the contrary, we think they rock!

But ideally, we want everyone to be able to automate their work wherever possible. Instead of spending time on repetitive and mundane tasks, we push those jobs onto computers – leaving ourselves more time to work on great features.

With that in mind, we are starting to adopt microservices patterns as we scale our engineering efforts, so that new features are now delivered as separate components from the main platform. This allows us to iterate on new functionalities quickly, test different technology stacks and involve people who do not have the inside knowledge about the platform in the development process.

Our latest project, Scheduled APIs (which will let you run your APIs on a schedule), gave us a chance to revise our technological stack, and have a look around at new paths for building long lasting components. Specifically, we used a set of new AWS solutions: Amazon Lambda and API Gateway.

Using Kinesis and Kibana to get insights from your data

Written by:

Matthew Painter, CTO at, walks you through how we use Amazon Kinesis for managing our routing of event data, such as queries being made on the platform, and how to allow your product and user teams to analyze the events in the fantastic Kibana 4, a “flexible analytics and visualization platform” that is powered by Elasticsearch.

Create a live data viz with SAP Lumira

Written by:

One of the greatest thing about the community is that they inspire each other to try new things and share fun projects. Recently, Shankar Narayanan SGS became inspired by Ronald Konijenburg’s post about getting data from the World Series of Poker and visualizing it in SAP Lumira. In it, Ron used to get data from the WSOP website and visulaiz it using SAP Lumira. But he did it with a CSV, which meant that the data was static. 

XPath 101: 3 common XPaths for extracting data

Written by:

XPaths are what make the app go ‘round. Without them, we wouldn’t be able to extract data, so needless to say, they’re pretty important. The reason XPaths are so important, is because when you click on data to train the tool, behind the scenes our algorithms are trying to work out the corresponding XPath for that data. These are the three most common uses of Xpaths in; but first, what is an XPath anyway? 

importSQL: The no hassle way to send your data direct to SQL

Written by:

Every Friday at we have a company-wide meeting where we get together and listen to one of the team talk about something data or company related that they’ve been working on. If we’re not convinced, we fire them. After that, we have cake.

Just kidding… there’s no cake.

Last week’s Friday talk revealed that a lot of our users really want a better way to put data into databases like SQL.

Occasionally we developers get a little time to ourselves to hack together some side projects or proof of concepts that could later get turned into features. Inspired by what I learned on Friday, I thought it would be a fun idea to come up with an easier way to send data from a website straight into an SQL database with zero hassle.

Combine with SAP Lumira and SAP HANA for business analytics

Written by:

We always love to hear what people using get up to with the data they extract with data sources – there are more use cases than we could ever imagine, and we are constantly seeing beautiful, useful and innovative work based on our platform.

Recently we were contacted by Ronald who wanted to show us two integrations he has done using the APIs.

How to use our advanced features

Written by:

I took a back seat on the webinar this week and left you in the very capable hands of our developer duo Chris A and Chris B (Bamford) who showed you some of our more advanced features. Now you may think that you need to a developer to use these features, but I’m here to tell you (as a non-dev myself) that actually the concepts are pretty simple. With just a little bit of extra knowledge you can make our tool do some pretty crazy things. And if you get stuck you can always shoot me an email at and I’ll help you figure it out!

Ok, enough of that… Let’s get this show on the road!

DevNetwork releases infographic naming as one of the leading companies in the developer technology landscape

Written by:

We are thrilled that our technology was recognized in DevNetwork’sLeadership Map of Developer Technology infographic, as a leader in API Services!  DevNetwork, producers of APIWorld,DataWeek, and DeveloperWeek, have organized over 120 companies into 9 core groups of technologies across 34 categories in this version one edition of the developer technology industry.  The infographic aims at making developer technology understandable to the entire business / technology community. 

The “DevOps” makes the developer

Written by:

This morning I read a very interesting article by Jeff Knupp entitled “How DevOps is Killing the Developer”. In it, Jeff argues that developers are at the top of the technical expertise chain (having the most niche, specific expertise), and as such shouldn’t be distracted spending time doing “DevOps” (or indeed QA, sysadmin, database admin, and so on), which he claims require less specific expertise.

Implementing AWS at scale

Written by:

On Tuesday, our Developer Experience Engineer, Chris Alexander, took an in-depth look at services available on the AWS cloud, so I made sure to tag along to watch the big show. The event was hosted at Skills Matter alongside the meetup group LJC: London Java Community. The audience of around 100 people were treated to an evening of: Details of using AWS to scale, and tips and tricks based around infrastructure and platform as a service. #LoveLearning.

Building features in pods

Written by:

The concept of breaking companies and teams into smaller groups in order to progress with units of work relatively independently is by no means a new concept. Even with its recent growth in popularity as a way of scaling teams, Amazon have done it for years, and 3M have been at it for decades.

Nowadays, flat-hierarchy organisation structures are all the rage, especially in the Valley. The concept is appealing for a number of reasons: the right people on the right jobs, the promise of easy scaling of the business as it grows, and work happening closer to the customer to name but a few.

Resolving merge conflicts

Written by:

As soon as you begin working on a software project that exceeds a few lines of code, managing changes rapidly becomes trickier. To counter this, developers use version control systems to track changes to folders and files.

These systems typically work by tracking changes to files as individual “commits” of work, which then allow you to build up a file’s current state by looking at its initial state, then the series of changes (or commits) that have affected it since then.

Sync Up! Keeping AWS Servers Inside a VPC on the Clock

Written by:

Keeping Your Server Clocks Synced EC2 servers are started in sync with the VM host’s clock. They are also configured to use the OS’s standard time servers – for Amazon Linux AMIs, these are *, and for Ubuntu it’s * This is great if your server is on “classic” EC2 so it can access the […]

Analysing ELB logs with Logstash and Kibana

Written by:

Last week, Amazon launched the ability to turn on per-request logging in Elastic Load Balancers (ELBs). This is a much sought-after feature that many users have been asking about for some time now, and finally it is here.

However, logging is only half of the battle. Once you have the logs you have to do something with them in order to be able to figure out if there are any problems that need addressing or improvements that can be made.

Introduction to Asynchronous APIs in Python

Written by:

Our current Python client library is quite complex and can be tricky to get your head around; the good news is that soon we will be bringing out version 2, which is much easier to use and comes with much more help content.

While we are putting the final touches on the new version, I want to take a few moments to talk about some of the core concepts of the client library.

Could OpenID Connect solve federated sign-in problems?

Written by:

Last week I took to our tech blog to outline how we went about implementation for our just-launched Social Sign-in feature. In it, I spoke about a number of implementation issues that came about with the use of OAuth 2. For example:

“…once you have the OAuth 2 access token, all of the [federated auth] services offer completely different APIs for getting hold of data about the user. For this, in the end, we resorted to writing custom implementations for each provider.”

APIs as Art

Written by:

Yesterday, Google launched a rather unusual new project called DevArt. In conjunction with the Barbican, Google is leveraging the creativity of developers to build some fantastic art, which will be exhibited at the Barbican along with the work of some commissioned artists in their Digital Revolution exhibition. 

Query up to 100 sources via One API call with import·io

Written by:

One of the most powerful things we allow users to do with import·io is to combine multiple sources of data, commonly referred to as federation. In order to do this, we have created an architecture that allows you to pass in a single query as an input for many sources (up to 100 at the moment). That query is then executed in parallel and paginated (if available) against each of the sources specified. The results of these federated queries are then streamed back to the client – either one of our client libraries, or our data lab web app. The end result is that it is possible to retrieve multiple pages of data from many sources, in a consistent format, into your app with a single API call.

Comet vs REST Query APIs

Written by:

When we talk about getting data out of import·io using our APIs, there are two main methods that we describe. These are known as REST queries and Comet queries. I want to quickly explain the key differences between them and what this means for developers.   REST Queries REST queries are the simplest way to […]

Exposing headers over CORS with Access-Control-Expose-Headers

Written by:

Working with cross domain HTTP requests in JavaScript is generally acknowledged to be a bit of a minefield. Recently I discovered a new CORS header, Access-Control-Expose-Header, which I hadn’t know about previously. As I had to do a lot of digging to get any information about it, I thought I’d make a note. The context […]

HTML5 Canvas toDataURL: WebM vs PNG vs JPEG

Written by:

We use HTML5 Canvas elements for a number of features in our client apps, and we wanted to know once and for all – what would be the best format for us to export our results to? The toDataURL method of Canvas can handle a number of formats. In Chrome it can export “image/png”, “image/jpeg” […]

HTTPs Now Fully Supported on Query API

Written by:

 I am pleased to announce that HTTPS is now fully supported on the Query API endpoints. Previously, only fully supported HTTPS requests. With a patch to our infrastructure yesterday, the * and endpoints now fully support HTTPS for CometD queries. We strongly recommend that all clients migrate to using HTTPS for querying. This […]

New Fields in the Cometd Query API

Written by:

We have just rolled out release Battlestar (if you were wondering, our first production release was Avengers) and with it come some new fields in the CometD messaging API for querying. MESSAGE objects returned through the CometD protocol now have two additional fields, the connector GUID and the connector version GUID that were used to return the […]

Keeping Up with import•io: System Status

Written by:

On the dev team here at import•io we’re frustrated when web services we need and love to use go down, or perform maintenance when we’re not expecting it. We work hard to make our systems highly available and to make sure our system rollouts are as seamless as possible. However, we may have times where […]

“Why I Love Everything You Hate About Java”

Written by:

I recently came across a post entitled “Why I love everything you hate about Java”. I love it. I would say that the Decorator Pattern is an important pattern in CS for modularity. In the comments there is a lot of to and fro from people, but I think I can encapsulate what it got […]

Omniscient Debuggers

Written by:

I was discussing Omniscient Debuggers recently with someone at an LJC meetup. Omniscient debuggers drastically reduce the time needed to debug software by giving the programmer complete freedom with respect to time: they permit to step forward and backward, and to immediately answer questions like “when was this variable assigned that value?”. This is made […]

Node.js Garbage Collection

Written by:

So I’m a fan of the concept of node.js, but am still unconvinced of its maturity. The big thing I am worried about is its garbage collection, which is comparable to very old Java GC in that it is a “stop-the-world” style GC, where all execution is paused while GC happens. According to this post: […]