Because the evenings are closing in and its getting dark earlier, some I find myself looking for indoor activities to keep me busy. Because I belong to the Data Addicts Anonymous club (support group here) I thought I would try to set a record for some kind of data insight… and this is what I came up with:
- Extract the first 20 pages of data from Growth Hackers
- Slap it into a pivot table
- Remove the outliers
- Play with the numbers
- Make a Chart/Viz
Using import.io’s Magic feature I created a CSV for the top authors on Growth Hacker. When I paste in the URL I get the bellow table….
When I download that data and open it up in my trusty Excel, I can create this chart:
Data after its been extracted and put into excel.
Before vizzing it, I removed all the outliers of anyone who had posted less than three times (because it means they’re not a regular contributor). Otherwise it would have looked like this:
As you can see the number of authors who only posted a single post was massive (72%).
Once I had excluded those guys, I divided the number of posts, by the number of upvotes to get the average upvote per post. All I had to do then was a simple line graph!
Authors by Upvotes by post shows Sean Doing well but not doing as well as you might expect.
You might imagine that Sean Ellis (the owner of the site) would also post the most interesting content, but it turns out that there are actually several people who (on average) get more upvotes than him.
Note: This data is only backdated to October 7th, so perhaps a few posters have been missed out, but I can imagine a regular community member would post more than once a month. Perhaps thats another study to do!
Conclusion: The top 10 most interesting Posters on Growth Hackers
Drum roll please… according to the average up-votes per post, your most interesting Authors/Contributors, in the 3 posts or more category for October/November are:
- Lyle McKeany
- Nichole Elizabeth DeMerè
- Andrew hanelly
- Drake Ballew
- Ryan Gum
- Sean Ellis
- Shannon Byrne
This “data science” was brought to you by Dan Cave in under 30 mins, during a coffee break. Imagine what you could do given some real time and effort.
Following a good amount of interest on this, I’ve scaled up my efforts (a bit):
- I expanded my data extraction to the first 50 pages
- I modified the batch extraction Google Sheets template made by Andrew Fogg
- Deployed some spreadsheet-fu
- …screen grabbed a better representation of the Top Contributors
Sorted by Upvotes per post Sorted by comments per post
This seems a little more fair now. >= 3 posts, sorted by votes per post.
Turn the web into data for free
Create your own datasets in minutes, no coding required
Powerful data extraction platform
Point and click interface
Export your data in any format
Unlimited queries and APIs