Query up to 100 sources via One API call with import·io

The Federation of APIs

One of the most powerful things we allow users to do with import·io is to combine multiple sources of data, commonly referred to as federation. In order to do this, we have created an architecture that allows you to pass in a single query as an input for many sources (up to 100 at the moment). That query is then executed in parallel and paginated (if available) against each of the sources specified. The results of these federated queries are then streamed back to the client – either one of our client libraries, or our data lab web app. The end result is that it is possible to retrieve multiple pages of data from many sources, in a consistent format, into your app with a single API call.

Using Traditional Methods

If you wanted to query 100 different APIs in the traditional way within one browser, you’d run into a number of problems. For example, web browsers only allow 6-8 concurrent connections to a single website domain. Additionally, various APIs have different request and response formats; need different headers and configurations; and other niggly implementation details that you can quickly get hung up on. Even using client libraries from different providers can rapidly highlight architectural differences they expect in your code, increasing the complexity with each library you add.

A New Way Forward

Now however, with import·io, in addition to collecting data from web pages, it is possible to plug APIs in to our platform. This creates a set of APIs and website data sources that can share the same inputs, outputs, and data format. It also yields a consistent interface for querying them together, as a group of sources. Using a common interface means that if you want to add or remove a source, it is a simple matter of configuration rather than any coding or integration that would have been necessary previously. More importantly, you only need to do one integration (with import·io) rather than to 100 different sites, saving you time, effort and reducing your exposure to integration issues.

API Abstraction Layer

When you consider the ability to federate a single query to up to 100 sources, it is more accurate to think of import·io as an API abstraction layer rather than a scraper (although you can certainly use it that way). In February, we’ll also be releasing a DSL to enable developers to hook APIs or web forms up to the import·io platform, in combination with data sources created using our app. This powerful new functionality will let anyone with technical knowledge, and an understanding of an API, to connect it to the import·io platform, and make use of it just like any other data source.

See it in Action

As a brief small-scale example of federated querying, consider this connector (you will need an import·io account to view the data) which takes a UK postcode as an input, queries the Marks and Spencer website, and returns data about the stores that are near to the postcode, with the columns “name”, “address” and “phone”.

We can see how we would query this data source using our integrate page, in this example with JavaScript. The essence of the code required is this:

Now consider this Data Set, which includes a “Mix”. A Mix is the way of constructing a federated query in the import·io UI. If you have an account, you can try querying it for yourself. An example input would be “EC2M 4TP”. When you press “Refresh”, the input postcode is federated to each of the 5 data sources, and you can see results from each of the sources appearing in the data grid as they are returned from each source.

We can again look at the JavaScript integration for this Data Set. The code to do the actual query is not much more complex than what we had before. In fact, we have just added in the GUIDs of the four data sources that were added to the Mix:

The beauty of this change is that any logic you had written previously does not need to be changed to support the data outputted by the additional sources. If you look into each of those in this Data Set Mix, the input and output names are the same, so you need no special handling for the new data over what you had previously.

-Matthew Painter, CTO

Extract data from almost any website