You may hear web scraping and shiver, thinking that it’s only for the most technical among us. However, marketers can use web scraping too, and they can get massive value from doing so.
You don’t even need to be super technical to figure it out, either.
While you’re probably aware of what web scraping is, at least on the surface level, let’s briefly discuss it just to keep an even footing.
What is Web Scraping? A Brief Primer
Web scraping is a form of data scraping used for extracting data from websites. That’s it.
If you, a human and a marketer, were to go to Yelp, search “restaurants in Austin,” and then copy and paste all of the information on that page to a spreadsheet, that would be a form of web scraping – no coding needed.
Of course, it’s easier to write a script, and it’s even easier to use Import.io. But at a fundamental level, web scraping is just gathering data from a web page.
Why scrape? Many reasons. Some common use cases include web indexing, data mining, ecommerce price change monitoring and comparison, market research, product review scraping, gathering sales leads, tracking online presence and reputation, and social listening.
Of course, there are now cheap and effective tools for most common forms of scraping. For instance, you don’t need to write a bit of code for online reputation tracking, because Mention is so easy to use. Anything outside fringe SEO cases are covered by something like Ahrefs. You don’t need to scrape social data to find influencers because Buzzsumo or Onalytica have you covered.
Outside of that, tool kits for web scraping were created. Import.io is one of these. You can also build your own web scraper (it will cost you a bit, though). Essentially, what was once a pretty expensive or complicated process is now democratized and cheap, and SEOs, product managers, marketers, and people with only a bit of technical aptitude are able to do some pretty incredible stuff.
For this article, we’ll focus on use cases that aren’t incredibly commoditized and don’t have easy and cheap tools to do the job. We’ll focus on some more creative use cases.
Three Masterful Use Cases of Web Scraping for Marketers
In regards to marketing, whether you’re performing an SEO audit, doing competitive research for a product launch, monitoring product reviews and pricing, or stirring up some creative data-journalism content marketing campaigns, web scraping is super valuable.
Not only can it give you an edge in terms of data and intelligence, it can also save you a lot of time and money when doing tasks you’d normally have to anyway.
With that said, let’s cover some cool uses of web scraping for marketers.
1. Intelligence Gathering for Content Marketers
This case study is a few years old, and while there are tools that exist that make some of it easier, it’s still relevant.
It was written by Matthew Barby on the Moz blog and explains how, without much technical knowledge, you can gather intelligence on targeted influencers and find data on them.
The benefits of this are many. Depending on what sites you scrape and what type of data you’re after, you could use this process to find information to target guest post opportunities, build relationships with influencers, or find freelance writers.
Or you could use the data to find out what’s working with other people’s content – word count, images, authors, etc. – and then try to replicate that in your own content.
You’ll need a few things:
- The SEO Tools for Excel Plugin
- A web scraper like Screaming Frog
Let’s start with part one: find the different authors on a blog or publication.
First, you need to scope out what publications you care about. In Matthew’s post, he gave the example of Search Engine Journal. What you’ll be doing is gathering a list of all the URLs from this domain using Screaming Frog SEO Spider (or a comparable SEO scraping tool). Here’s a tutorial for using Screaming Frog. Once the URLs are finished loading, export all of the data to an Excel spreadsheet.
You can get rid of all the extra information that Screaming Frog gives you, leaving just the list of raw URLs in the first column of your worksheet.
Next, open up Google Chrome and navigate to an article on the domain you’re analyzing. Find where the author’s byline is and right click + inspect element.
Within the developer console, the line of code associated to the author’s name that you selected will be highlighted (see the below image). Then you just right-click on the highlighted line of code and press Copy XPath.
Now, In cell B2, add the following formula:
Important Notes: “//*[@id=”leftCol”]/div/p/span/a” is specific to the Search Engine Journal example. That’s the XPath. It will be different depending on what site you’re scraping. XPathOnUrl is a function in the SEO plugin for Excel.
Essentially what this formula does is pull the author name from the URL in column A, and it will compute an error if there is no author name there.
There are lots of other uses for this, as well. You can further explore and scrape data to find things like:
- Author pages
- Author social media profiles
- Author social media followers across profiles
- Page titles
- Number of words per post
- Date/time the post was published
Check out Barby’s article if you’re interested in learning how to pull that info.
The point is, even though tools like Buzzsumo exist, you can gather pretty much an unlimited amount of data if you do the scraping on your own. You can also target things more specifically (by publication and author and social network, etc.) By having all the data in Excel, you also get the benefit of greater data manipulation capabilities.
When you gather intelligence like this, you can begin making data-driven decisions regarding content marketing – not just going with your gut.
2. Scraping Indeed for Common Job Skills
One of my favorite use cases for web scraping is to find external data for what is usually known as “data journalism,” an increasingly popular and important form of content development.
People like data. They like facts and insights, sound bites, and especially visualizations. You can pull off all of this with some fairly simple web scraping.
The following example, while not directly using web scraping for content marketing or journalism, is super impressive (and since he wrote a blog post, it arguably was used for content marketing).
There are many job websites out there – Indeed, Glassdoor, etc. These jobs sites have tons of data. They’re also not difficult to scrape.
One use case I found particularly inspiring was from someone learning data science. They wanted to see which skills were most in demand. You see, a lot goes into mastering data science…
Learning all of that would probably take a lifetime or two. Clearly some of it is more important than other stuff, so Jesse-Steinweg Woods, the author of the post, turned to web scraping for the answer.
He also wanted to see if different cities have different skill requirements (i.e. does the Silicon Valley market have different skills they prefer compared to New York City?)
He set up a program in Python to accomplish it. The basic workflow of the program would be:
- Enter the city we want to search for jobs in matching the title “data scientist” (in quotes so it is a direct match) on Indeed.com
- See the list of job postings displayed by the website
- Access the link to each job posting
- Scrape all of the html in the job posting
- Filter it to only include words
- Reduce the words to a set so that each word is only counted once
- Keep a running total of the words and see how often a job posting included them
Now, the post gets quite technical (though I suggest you read through it, it’s very interesting). But the end result is some interesting graphs comparing the different distributions of job skills in data science roles depending on the city. For instance, here’s Seattle:
And here’s Chicago:
You can see some quick differences right off the bat, like the swapping of Python and Hadoop.
It’s interesting by city because, if you were looking for a job in San Francisco, you’d like to know what is in demand there. But it’s also interesting to look at aggregate trends. Are data visualization skills in vogue? Doesn’t seem like it from this data.
Now, I said this isn’t specifically related to marketing, but you’re probably wondering how it could be at all.
Well, data-driven content is in high demand. If you don’t have your own massive datasets and data scientists to explore them, you can get this data from other sites. How interesting would this be if Indeed put this post out themselves? Or any company training data scientists? It would be amazing content!
This is essentially data journalism, and it’s becoming a foundational skill taught in journalism schools. Obviously the same goes for content marketing, though. There’s so much noise and so little signal in the content marketing space that doing your own data gathering and analysis like this can cut through like a knife and get attention (while providing tons of value)
Luckily NYC Data Science is smart enough to produce that caliber of content. They wrote their own post on scraping Glassdoor for data science insights. Here’s a chart they put together mapping out the best cities for data science (by count of job postings):
If you’re the masochistic type of marketer and want some more technical reading, here’s a good primer to scraping with Python.
3. Outbound, For The Win
The most common use case for web scraping for marketers is definitely outbound marketing or sales.
Advertising is expensive and sloppy. If you have a bit of precision, some technical wit, and a good outbound sales or outreach process, then you’ll be able to make this work as a channel. Especially nowadays, data enrichment, personalization, and automation tools make this an outstanding channel that very few companies are using correctly (or at all).
The gist is this: find targeted leads, scrape their data, enrich their profiles, automate outreach sequence, personalize outreach sequence, profit.
There are many ways you can go about this, many iterations, so I’ll just walk you through one hypothetical example.
Let’s say you’re in the content marketing space. You sell software for content marketers. Your ideal customer profile is a content marketer.
The hardest part of sales is gathering quality leads. Luckily, with some ingenuity, we can think about and discover where content marketers hang out.
One obvious place: LinkedIn. Especially if you have a Business account, you can very easily search LinkedIn for content marketers at companies of a specific size, location, etc. etc.
Then you can use a tool like Contact Out to pull email address and put them into a targeted email list.
That’s probably the most common strategy, though. Let’s go a bit deeper and sketchier, and look inside Facebook groups. First you need to find the right group:
From there we can choose the broad category of Business:
You can then hone in on the related tags to get very specific to your niche (important in sales):
You’ll have to do a bit of sifting and exploration, but you can eventually find your perfect group:
Then, of course, you need to join the group. Now how do we get contact information?
First, download a Chrome extension called Grouply. It lets you pull:
- First Name
- Last Name
- Company Position
- Company Name
- Profile URL
Once you have data like first name, company name, last name, etc., there are a million tools you can use to get their email address (and other data) from this. The best is Clearbit, but you can use cheap email gathering tools like Voila Norbert too.
Now you’ve got a ton of targeted emails, as well as other variables that you can use for personalization like company position and name. Congratulations! Now what?
Depends how aggressive you are. Most people don’t like it when you scrape their info and cold email them. But if it works and you’re okay with it, why not? You can also use a tool like Mailshake to automate and personalization these emails. At least use their first name, yeah?
You can also play a softer approach and build custom audiences with Facebook or Google and send targeted advertisements. Sometimes, the best strategy is to use a mix of targeted ads and cold emails.
The marketing decisions are up to you. Just know, you can scrape lead data pretty darn easily.
Quick Caveat: Scrape Carefully
Don’t be a jerk with people’s data. Beware of websites’ scraping policies. Be creative in using the data you scrape (for god’s sake, don’t just send out a ridiculous, spammy email blast). Remember that some of us are in marketing for the long term, and short term spammy tactics ruin marketing for everyone.
Outside of the common sense and morality issues, realize that some sites outlaw web scraping. For instance, it’s in Twitter’s and Google’s terms of service that you can’t scrape without their permission. Other sites like Angel List disallow scraping but give you (paid) access to their API. So tread carefully.
Web scraping isn’t just for engineers, data scientists, and other super technical folks. Marketers can use web scraping easily and effectively, too.
This post outlined three powerful and creative uses to use web scraping for your marketing efforts. There are many, many more ways. Tools are making it easier and easier to access data, so I’m sure the future will bring further opportunities, too. Learn how to use web scraping now, and beat out lazier marketers.