For venture capital (VC) firms, staying on top of fast-moving technology trends is critical. By leveraging Import.io, Madding King III at Camp One Ventures was able to establish a useful and objective way to track trending buzzwords in the VC sector. Read on to find out what he discovered, and how he did it.
VC buzzword rankings
With the help of Import.io, Camp One Ventures has set out to establish a way to assess the top buzzwords appearing in VentureBeat, one of the leading sources of news for the VC community. We extracted over 16,800 headlines using Import.io and then analyzed the appearance of certain buzzwords in those headlines. We counted and then ranked the most popular buzzwords and compared the changes in ranking between 2015 and 2016. We saw some significant changes in the hot areas for VCs in 2016. What follows are a few of the highlights:
- Mobile. “Mobile” is the most frequently referenced term in headlines in 2016 as it was in 2015. It is interesting to note that both the terms Android and iOS dropped slightly in 2016, but discussion of the higher level category “mobile” remained very strong.
- VR. Between 2015 and 2016, mentions of VR increased significantly, causing the term’s ranking to rise from 31st to 4th. Clearly, VR is a sector that has seen significant growth in interest and investment from the VC community, and some of the investments being made in this area can be expected to yield big dividends in the years ahead.
- Bot/chatbot. This is a term that was unranked in 2015 but climbed to number 10 in 2016. As natural language processing tools like Siri, Alexa and Google Voice Actions make significant strides in usability, bots are appearing as a way to establish enhanced automation.
- AI. Unranked in 2015, AI (short for Artificial Intelligence) emerged to hit the number 24 spot. Given that AI is powering the innovations behind bots, cars and other top VC trends, I’m surprised this term hasn’t risen even further.
How we did it
VentureBeat is a great source for VC news. Import.io is a great tool for extracting content from the web. I wanted to see how the news headlines on VentureBeat have shifted since 2015, so I used Import.io to extract all of the headlines since January 1, 2015, sifted out the keywords and compared the results.
This is a high-level overview of how I did it:
- Created a new Extractor in Import.io using the first URL from the VentureBeat website from which I wanted to extract data.
- Used the URL generator functionality in Import.io to create all of the URLs for VentureBeat.
- Ran the Import.io Extractor over VentureBeat and then downloaded a spreadsheet containing all of the extracted data.
- Analyzed the results using a text mining tool.
Create a new Extractor
To get started, we need to set up a new Extractor in Import.io for the VentureBeat website. I took the first URL from the website’s article list and entered it into Import.io.
Within a few moments, the contents from the web page are transformed into a table. I can then easily modify the results to align with my objectives, for example, renaming columns and deleting columns I didn’t need.
Once I had the Extractor created, I needed URLs for all the pages from VentureBeat containing headlines published since the start of 2015.
When you go to VentureBeat, you’ll notice that it has infinite scrolling. That can be a challenge when you’re looking to determine the URLs that you need to use. Once you start scrolling on the page, you will notice that the URL changes from http://venturebeat.com/ to http://venturebeat.com/page/2/. This is your starting point and shows you how VentureBeat handles pagination. I had to play around for a little while with the VentureBeat site in order to understand how pagination was operating but I was quickly able to work out that at the time of extraction, http://venturebeat.com/page/429/ was the page that I needed in order to hit headlines from January 1, 2015. Therefore the URL spread that I wanted would go from http://venturebeat.com/page/1/ to http://venturebeat.com/page/429/ stepping up in increments of 1.
It is easy to use the URL generator in Import.io to create the entire span of URLs that is required. The first URL that was used to train the Extractor will appear as an example. Selecting the URL page number in the URL generator will set it as a parameter, enabling you to vary the page number and in doing so create a range of URLs. For example, I entered the values “2” to “429” to generate URLs for all of my pages. Once the URLs are generated, I can save them, associating them with the VentureBeat Extractor that I’ve set up.
With the URLs saved to the Extractor I can now “Run” the Extractor in order to extract the data from all of the web pages. All the data will be downloaded according to the settings in the Extractor, so only the columns that I want will be saved and the columns will be labeled appropriately.
The Extractor will take about 5 minutes to run through 450 pages. Once it is complete you will have a dataset that you can download and use in a spreadsheet for analysis.
Analyze the results
There are lots of analysis tools available, so you can choose what’s best based on what you are trying to do. We used text mining software in order to break each headline into words and counted how many times each word was used. We then grouped the results by year and developed a simple ranking based on the number of times each word appeared in a headline in a particular year. Finally, we created a simple visualization of the rankings in order to make them easier to interpret:
Using Import.io to extract data from a website is painless, and makes the process repeatable and scalable. Now that we have established this process, it will be easy for us to continue to keep tabs on these buzzword trends in the future. These insights are invaluable when working with our portfolio companies — and in assessing prospective companies to partner with.
About the user
Camp One Ventures backs great teams that are looking to bring new software or technology solutions to large markets. From its long-time base in Silicon Valley, Camp One Ventures leverages a deep network to identify great early stage companies. The company has extensive experience in advising early stage technology businesses, and helping tap into their unique insights and expertise. The organization also utilizes local teams’ venture capital networks to help portfolio companies raise subsequent financing.
Turn the web into data for free
Create your own datasets in minutes, no coding required
Powerful data extraction platform
Point and click interface
Export your data in any format
Unlimited queries and APIs