An astoundingly simple way to extract rich data from millions of web pages

 

Recognize the page in the screenshot above? Of course you do. It’s the Apple Watch page from Apple.com. It’s a beautiful page. But it is not just that. Underneath lies a table of data that looks a lot like a spreadsheet. Each table row contains information on a watch model. Each table column contains information on a specific aspect of the watches, such as price, product descriptions, or a link to an image.

What if you want to use that data trapped in the Apple carousel to build your own gallery or a chart, or a sortable table of Apple Watch data? With Import.io and Silk, now you can… in only a few seconds. All you need to do is paste the URL into the “Extract data from website” box in your Silk Dashboard. You don’t need to know any code. There is no plug-in. It just works.

Sounds too good to be true? Then check out the video below to see the new Silk-Import.io seamless data extraction tool in action on a live Apple Watch product detail page.

And, just like you saw in the video, here is all the watch data in Silk gallery. You can click through to see all the data on the Silk datacards. You can even build your own visualizations – charts, galleries, groups or mosaics from the data. Or you can try it yourself. It literally takes a minute. (If you are not a Silk user, you can sign up for free and extract data from the Apple Watch page or millions of other pages, too).

OK, we hope this blew your mind. Now, we’ll answer some obvious questions.

So how does this *really* work?

Magic, quite literally. Silk has integrated with Import.io’s Magic API. Magic is the data extraction engine that looks at the contents of a Web page and automatically identifies structured data. When a Silk user pastes a URL into the “Extract from Website” box on their Dashboard, Silk sends that URL over to Magic. Magic extracts the structured data from the page and converts it into a table format. Magic then sends the table back to Silk. Silk imports the table data and converts that data into a Silk site.

Cool, Apple Watch. What other pages can I use this on?

This isn’t a one-trick wonder. Want to convert the CB Insights Unicorn Tracker into a live database and analytics platforms for unicorns? Cut, paste, visualize, analyze. How about San Francisco real estate listings on Zillow? In fact, there are millions of pages on the Internet that are either entirely well structured in a table format or have portions that are well structured.

 

How about pagination?

The Silk-Import.io integration supports limited pagination for the first 5 pages of data. We anticipate this should handle most the data extraction requests. If users demand deeper pagination, we expect adding that shouldn’t be difficult.

 

Will this work on all Web pages?

Unfortunately, no. The Silk + Import.io Magic data extraction process only works on web pages that have a clean, recognizable table structure. This could be a table hidden in a carousel (like the example above), a table lurking behind a map, or a table that is disguised as an image grid. But without clean structure, the extraction doesn’t work.  Also, web pages that are heavy on JavaScript or are dynamically served will probably not work. Still, this leaves millions of web pages that will work very well and allow for clean data extraction.

For ideas on more of them, check out our ideas gallery (published as a Silk, naturally).

Want to try it yourself?

  1. Sign up for Silk.
  2. Copy the Apple Watch page URL (or one of the other URLs listed here) into the “Extract from Website” box in your dashboard.
  3. Publish the visualizations on your Silk home page or on your blog.

Extract data from almost any website


INSTANT ACCESS