So, I was playing around with our new Bulk Extract feature after my webinar on it earlier this week, and I came up with this super cool use case that I just had to share. The great thing about Bulk is that it helps you get 1,000s of rows of data without breaking a sweat.
Like many people, I always enjoy taking a break to check out Growth Hackers to see what the latest trending news is. Which got me thinking: Is there one source that gets posted more than any other?
To find out I built an Extractor to the first page of Growth Hackers, then I used their URL pattern to concatenate a list of URLs for all the subsequent pages. Finally, using Bulk, I passed all those URLs through my Extractor. Just a quick note here – I didn’t really feel like doing the math to figure out how many URLs would need to cover all of growth hackers, so I made 400 just to be safe (turned out I only needed 247).
After only about 3 minutes (from start to finish), I had data on all 4,996 posts – that’s how many there were this morning anyway. Then, I fired up my trusty Tableau Public to do some simple analysis – which you can see in the viz below.
It’s pretty clear from looking at the bubble graph that medium.com is a clear outlier when it comes to sheer volume (233) of links posted. Medium is similar to GH, in that it’s a site where people can publish content, so it’s not really that surprising that people also post the stories to GH. The next 3 are moz (an SEO blog) with 99, kissmetrics (technical marketing) with 90 and conversionxl (website conversion tactics) with 84. Speaks volumes about the type of people who post on GH.
Things even more interesting when you plot how many votes each of these sources got over time. All of them are really varied, with no real trends either way. It gets even more interesting when you plot the total number of posts vs the total number of upvotes. You can see how perfectly the line fits with hardly any strong variation. Ideally (as a source), you’d want to be below the line, because that would mean you’re getting more upvotes for your efforts.
I’m no data scientist, and this is only a cursory analysis, but I’d love to see what you guys can come up with. Here’s my data set if you want to do any further data manipulation. And if you’re looking to get loads of data from a site I highly recommend you check out Bulk Extract.