The web is a wonderful place for information. I can open a browser and have the answer to any question within minutes. But the web is not so great when it comes to data. Getting data from the web is difficult. And the only solution is a bit of a dirty secret for our industry, a dirty secret that we don’t like to talk about…”web scraping”.
The reality is that if you are a data owner with a data source on a website, then that data source is almost certainly being scraped, today. You have no insight into this. You have no control over it. It is just a cost for you. This is not good.
Web scraping is also not good for data users. It is high cost as it requires expensive developer time. The rights that you have to use the data are uncertain: do you need to hide the fact that you are scraping? do you need to combine the data with other data before you can use it? If you need multiple data sources then you create a data integration problem for yourself: you have to normalise and integrate the results of multiple web scrapes. Even if you tried to pay the data owner for access to their data source, they probably wouldn’t be able to take the money off you as they are not in the business of selling data.
In summary, getting data is a problem and web scraping is neither a good technical solution nor a good economic solution.
Import•io is a place where data users (people who want data) and data owners (people who have data) can better interact. It is a platform upon which connectors to data sources can be built along with a suite of tools to make it easy to build connectors to either API or web data sources.
The way it works is as follows. A data user can come to the platform and build a connector to a data source using the connector builder.
The connector builder allows a user to build a connector to a data source just by interacting with it. It is quick and easy. A connector to that data source is then immediately available to the data user on the platform over our single API along with all the other connectors on the platform.
At the same time, as soon as a connector is built to a data source. The data owner will be able to see that their site is being accessed by import•io and they can come to the platform and claim the connector.
This kicks off a validation process and once verified the data owner gets access to a data owner’s view of the platform – exactly what this looks like is something that we are working on at the moment with data-owners. The data-owner gets insight and analytics into the use of their data source. And they get control over the use of their data source, including the ability to block the connector if they choose. We will be advising data owners not to block connectors so that they can retain access to data-usage insight and analytics.
One idea that we have for the future of the platform is to allow data-users and data-owners to buy and sell data to one another. How this would work exactly is something that we are still working out.
This week import•io launches! We are building the features of the platform out and we would love to have you join us and help solve the problem of getting data flowing more easily.
by Andrew Fogg, CDO