4 data extraction tactics to take your SEO to the next level

web scraping

Data extraction has powerful uses in a number of industries. Some notable use cases are equity research, marketplace inventorycompetitor price monitoring, lead generation, and reputation monitoring.

In this article, we’re going to dive into how you can use data extraction to gain traffic from Google. Or in other words, how you can use data extraction to take your SEO to the next level.

Here are four tactics used by companies big and small to dominate the SERPs.

Aggregate content for better comparison

People like to comparison shop. Which is in-part why aggregators rule the search results.

Typically these sites leverage user generated content. User generated content is SEO gold not only because it fills up landing pages with fresh, high word-count content Google has liked historically, but provides users a better experience, which is likely the future of SEO.   

But what if you have no user-generated content? That’s where data extraction comes into play.

Back in 2004, Indeed.com shook up the recruiting market by replacing the traditional ‘job board’ with the world’s largest job search engine.

According to Business Insider, Indeed.com initially didn’t collect it’s own job listings. Rather, the site indexed job listings from job boards, recruitment agency websites, and employer recruitment pages, and still does to this day.  

Indeed.com needed none of it’s own user-generated content. Rather, it aggregated all the jobs into one central place, resulting in a better experience for job seekers. Additionally, Indeed categorizes jobs by job title, company, and geo, creating SEO-optimized landing pages that offer the freshest, most up-to-date results.

Indeed.com went on to beat out it’s competitors, sell for a rumored $1+ billion, and become the 52nd most trafficked website in the United States.  

Since then, the company has launched a variety of products, and sources many of it’s own jobs and job reviews.  

But it was the ability to aggregate content and provide a solid comparison experience that made them the SEO giant that they are today.

Key Takeaways:

  • If you don’t have your own unique, useful content to fill out SEO landing pages, search for places around the web that you can aggregate it from.
  • Don’t think about scraped content as your end game. While Indeed’s search engine propelled it to search dominance, it also gave them access to consumers and businesses which would then generate their own content.  
  • If you’re dependent on other sites’ data sources, you can get shut down, not to mention there are legal ramifications. Be sure to weigh these risks.
  • Import.io data extraction can help you get content for aggregation on your website.

Make use of messy government data

The US government collects tons of really cool data. In fact, the US Census bureau has budgeted $1.6 billion in 2017 to collect data on population alone.

Unfortunately, if you ever visit a government website, you’ll likely pull your hair out trying to find the information that you want.  

This presents a fantastic opportunity for savvy marketers and entrepreneurs.

Real estate sites Zillow and Trulia (now the same company) are fantastic examples of this.  

According to Zillow’s website:

“Zillow receives information about property sales from the municipal office responsible for recording real estate transactions in your area. The information we provide is public information, gathered from county records. Our parcel information, which outlines the lot on which your house sits, comes from various public sources, such as the county. We regularly update the information as we receive it from data providers.”

ZIllow simply started out as a way for consumers to compare pricing. All the data was available, it was just in the muck of thousands of city and county websites. Additionally, they used all this pricing data to come up with a proprietary score, the Zestimate, or a current estimate of your home’s value.

Zillow launched targeting the consumer audience – with NO for sale listings. They launched with millions of home records and the famous Zestimate.

By extracting all of this data Zillow created a unique, authoritative data source that would naturally acquire links from articles such as this one from 2006.  

When you have unique data that people are searching for that also innately acquires links, plus a scrappy content marketing / PR strategy, that’s a recipe for SEO success.

Zillow is now a household name worth $7.6 billion.

Zillow is an extreme success case, but definitely not the only company that’s scraped and repurposed government data for success.

Take Permitzone, a startup that allows contractors easily apply for building permits. They have a much simpler approach that has proved effective.

“Contractors have to apply for permits dozens of times a year, many times in different localities, and the process is a pain”, according to co-founder and CEO Ray Antonino.  

Suppose, you were on a project in Myrtle Beach and went to the municipal office in search of a construction permit.  You’re greeted by a circa 2001 style website with links to a couple dozen PDFs to find your information.  Ugh.


Contrast that to Permit Zone’s Myrtle Beach page, which capitalizes on this government ineptness by scraping all government websites and aggregating the appropriate PDF forms into one clean, friendly interface.  

On the page a call to action that directs visitors into their funnel. Furthermore, I suspect they don’t even need to outrank the .govs that host the original data, as there are probably enough fed up visitors that bounce off those horrendous sites in hopes of better information.

Even if data aggregation isn’t your core business as it is in these two examples, you still may find a use for government data.

For example lawn care marketplace LawnStarter improves the trust and uniqueness of their city pages by curating local regulations pertinent to the lawn. For example, this page about lawn care in Columbus contains information about the long grass ordinances in the city, offering the most digestible answer possible, and linking back to the original source in case users want more info.

I don’t think the government is going to be hiring a head of SEO anytime to make their data any easier to find, so if you can repackage the Government’s data in a useful way, take full advantage. It’s not often you get to increase traffic on Uncle Sam’s dime.

Key takeaways:

  • Government agencies produce a lot of very valuable data, but aren’t great at presenting it. This is something you can capitalize.
  • You need to be able to scale the aggregation process, which may be easier said than done. Zillow scales their process by working with several third party vendors that they purchase it from.
  • Try and use your scale to shape the data into something even more valuable, such as Zillow’s Zestimate.
  • Like Indeed, Zillow only used this data as a starting point. Now they collect data from homeowners and realtors as well as third party sources. I suspect that Permitzone plans to add their own content as they gain more transactions.
  • Import.io data extraction can help you get content for aggregation on your website.

Hone in on the content your audience cares about and the people they get it from

If you’re trying to grow traffic to your site, you probably know you need to create engaging content and promote it to the right influencers.

That may be intuitive if you’re in an industry like marketing. But suppose you’re in a boring industry, or perhaps one that doesn’t have a super straightforward content play.  

The key here is finding the content your customers are already consuming, and the people creating and sharing it.

Tactically, the steps are as follows:

  1. Find all of your customers on Twitter. This can be outsourced to a VA
  2. Use Import.io to pull each account’s followers, export to one giant spreadsheet
  3. Use a pivot table to see which accounts most of your customers have in common. Filter out mainstream news sources, celebrities etc, and see who is remaining.
  4. See what types of content those accounts are sharing
  5. Model your own content after theirs, then pitch your masterpiece to those influencers you found by whichever tactic you can

Rather than spray and pray, this method

  1. Ensures that you are creating content that is similar to the content your audience is already reading
  2. Gives you a list of influencers that already share similar content

If you’re in a B2B niche, you also might want to try the same process with Linkedin influencers and / or interests.

To take it a step further, you can also pull the follower count of each influencer, and prioritize who you want to reach out to. The heavy hitters are probably less likely to share your content (depending on your company’s klout), and influencers with too little of a following may not be worth your time.

Key takeaways:

  • In some industries, it may be easy to figure out your content strategy. For example, if you’re marketing email software, you probably want to write first and foremost about email marketing.  Other niches are a bit trickier.
  • If you’re uncertain about the type of content you should be producing, look first and foremost to who your existing customers follow on social media, and what content is being shared.
  • Tailor your content accordingly, and promote it to those already sharing similar pieces.

Get a leg up on your competitors with rank tracking

Yeah of course, there are plenty of tools that track your rankings compared to those of your competitors.

What I’m talking about is maintaining a competitive edge over your competitors by monitoring the SEO tests they’re running.

In this article, Pinterest growth engineer Julie Ahn describes Pinterest’s system for A/B testing onsite changes related to SEO. She explains how Pinterest turns on-page changes, title tag tests, and even javascript rendering into an exact science.

If you’re in a competitive niche, chances are your competitors are already running tests. Title tags are likely the first place you’ll want to look, as they are the easiest to change. Just check out the travel giants duking it out in the hotel space (10 seems to be the magic number).

Now you can only measure your own traffic, but if you have the right rank tracker you can rank rank fluctuations along with title tag changes, or even on-page changes that you might want to test. This makes sure you’re always one-upping your competitors.

Don’t have much competition? Monitor competitive niches like travel and job searches to see what’s working in other industries. And of course, don’t forget to monitor the results of not only rank changes, but traffic and conversions (you’ll likely need to build your testing own system) if you haven’t already.

Key Takeaways:

  • There’s a finite amount of clicks out there for a given term, and A/B testing is one way to ensure that you’re maximizing your own traffic. Monitoring your competitors’ changes is vital to staying on top.
  • Even if you think you’ve successfully copied or one-upped your competitors, you need to measure the results for yourself.
  • Monitor websites in other industries to get test ideas.
  • Be sure to not only measure rankings and traffic; make sure you’re tracking whatever metric matters most, such as conversion rate. Often click-baity titles can lead to lower quality traffic