We had another awesome joint webinar this week. Alex Salkever from Silk.co was here at the import.io webinar studios to help me show you how to create some stellar vizzes with your data. In less than an hour, we crawled Kickstarter and made a bubble map showing the difference in concentration of Technology vs. Journalism startups across the World.
Crawling for Data
As always, we start with the data. I chose Kickstarter because it presents an interesting challenge for our Crawlers – the infinite scroll! I’ve already done a blog post on this subject, but I’ll cover is super quickly here.
The key to crawling sites with infinite scrolls is to find where the URL lives inside the HTML by using the Console tab of the inspect element function. Once you’ve got that, just look for the page=X part of the URL, then using the concatenate function in Excel or Google Sheets you can generate a list of all the potential page URLs. Finally, paste that list of URLs into the “Where to start” box with a crawl depth of 0 and let the Crawler do its thing!
For this particular data set we are only going to look at the top 100 in each category. For each category, we extract the Name, Creator, Description, Location, % Funded, Amount Pledged, Image and Category.
To get the Category (which is at the top of the page and not in our trained rows) we had to use a little Xpath magic. Right click on the category in Chrome and click “inspect element”, then right click on the element and click “copy Xpath”, then paste the Xpath into the “Xpath override feature in the advanced column settings of import.io.
Over to you Alex
Now that we’ve got our data it’s time to do something a little bit more fun with it. Once you’ve got your data set in import.io, you need to download it as a CSV. Within Silk we can do lots of different things with the data we just crawled.
Create a new silk, give it a name and upload your CSV directly into the program. This will give you a preview version where you can change which columns from your spreadsheet are displayed where and what kind of data they bring back. You can also ignore columns you don’t need. This lets you do data cleansing right within silk instead of having to use another program. Another super cool thing about Silk is that they do the currency conversions for you.
Once you’ve done that you can set your silk column headings which you’ll use when you build your visualizations and filter your data. Then, head to the “homepage” and choose what type of viz you want to do. You can play around with all kinds of options in this step to manipulate your data in fun and interesting ways.
You can even edit silks that you don’t own by clicking on the “Explore” tab. In this view you’ll be able to edit the visualizations but not the underlying data.
Here’s the one we made in the webinar. Have a play around with it and see what you can come up with. We’d love to see them, so pop them in the comments section of this blog post or tweet them to us on @importio and @SilkDotCo.
For more on how to use silk.co, check out their tutorials page.
Does the Xpath work as well in Firefox or is Chrome recommended/optimised?
Xpaths are universal, so any Xpath you pull from Firefox should work just as well as one you get from Chrome. I just like using Chrome because I find it easy, but Firefox is a perfectly good option as well.
Is Silk free?
How detailed is the Geo data in Silk outside the US.?
You can use the filters in the Silk to filter down to specific locations,
Is there a limit on the amount of data in a spreadsheet that Silk can visualize?
In general the row limitation as roughly 5 – 10 thousand rows, any more than that becomes very difficult to visualize in a way that makes sense. The column limitation depends more on the type of data, things like locations and images tend to slow the program down.
Join us next time
Next week (Oct 16) I will have another very special guest with me at webinar studios! The fabulous Jewel Loree from Tableau will be here (all the way from Seattle) to help me figure out what to be for Halloween. We’ll be pulling costume data and creating a little dashboard so you can figure out whether to go “sexy” or “spooky” this year. For a sneak peak at the viz, check out Jewel’s blog post.
Turn the web into data for free
Create your own datasets in minutes, no coding required
Powerful data extraction platform
Point and click interface
Export your data in any format
Unlimited queries and APIs