Web scraping – how to use it? Consider the amount of raw data floating around the internet: webpages made up of text, images, videos, graphics, memes, infographics, and more. It’s mind-boggling when you slow down and think about it.
The latest estimate puts the total number of websites at roughly one billion, with new ones added and old ones disappearing all the time. Each second, there are approximately 7548 tweets, 772 Instagram photos posted, 2,573,338 emails sent, and 42,943GB of internet traffic… to name just a few.
According to Cisco, global internet traffic hit 1.1 zettabytes per year (a zettabyte is one sextillion bytes, or the same as 36,000 years of HD video) at the end of 2016, and will cross the 2.3 zettabytes threshold by 2020.
Overall, the internet has nearly doubled in size every year since 2012.
So, yes. It’s big. And there’s a lot of data.
That data can be harnessed to help with a wide variety of things if you know how to do it. You could navigate from site to site, from page to page, picking and choosing the details you need for whatever it is you’re doing, and copying the relevant information to another file or spreadsheet before moving on to the next website, page, or paragraph.
That’s Option A, or what we might call the “Classic Method” (classic because you really had no choice) – slow, laborious, and tedious.
Let’s dig a little deeper into the art of getting data from a web site.
Web Data Extraction
Also known as web harvesting, data mining, screen scraping, and web data extraction, web scraping is the ability to access data from a website, which is then saved to a local file on a computer, database, or spreadsheet in a structured format.
It automates the process of copying and pasting selected sections of a page or an entire website to be reviewed and analyzed later from one convenient place.
These tools first fetch – download the page for viewing like a web browser does – and then extract the chosen data – which may be copied, parsed, searched, and reformatted. Many tools allow you to collect data from hundreds or thousands of URLs at the same time (or in scheduled sequence).
Basically, any data that can be viewed online – even behind a login wall (provided you have the proper credentials) – can be extracted.
“But why?” you might well ask. What’s the big deal, and what are some of the things you can do with the data?
They say that knowledge is power, and in 2017, knowledge comes increasingly in the form of digital data sets. The more you have, the better your decisions, plans, tactics, strategies, and success.
So how can you use this data? Here are just a few of the ways.
In Your Marketing Efforts
It doesn’t matter what you’re selling, whether it’s a product or a service, whether it’s something everyone could use or designed for a very small and exclusive niche; if you want to succeed and grow your business, you need to market.
And in 2017 and beyond, that means promoting online and working with digital data. According to Hubspot’s annual State of Inbound report, 65% of marketers say generating traffic and leads is their biggest challenge.
Web data can help with both. Let’s look at the traffic issue first.
1. Search Engine Optimization
Traffic coming to a website can arrive from a variety of channels, including direct, paid, social, email, and referral.
For many, though, it’s organic search – traffic originating from a search engine inquiry – that serves up the biggest slice of the pie. And as luck would have it, this traffic often tends to be the most relevant and highest converting (because they went looking for something you have or know).
There are several ways you can boost your organic search traffic, but they all ultimately have to do with your search engine optimization (SEO).
Enter web data. With it, you can get SERPs for SEO management, and take your SEO analysis up a notch (or two).
First, you can track your page ranks over time by scraping the various search engine result pages for your given keyword or query.
You can immediately see where you rank for each targeted keyword, as well as whether you’re moving up (so should continue doing whatever you’re doing), down (and should change course), or staying the same (you need to do something) over time, and whether some keywords should be abandoned because the top five or ten results are made up of virtually unbeatable websites (think Amazon, Apple, Walmart, and so on).
You need to rank high. Results 1-4 get around 83% of the clicks, leaving a paltry 17% for every other result.
Second, you can turn to your direct competition and see what keywords and phrases they’re ranking for and targeting. You might find some that you hadn’t even considered. A thorough scrape and text analysis of their site content will reveal their keyword list and strategies. If they’re generally ranking higher than you, consider switching to a few of theirs.
A scraped content data set can give you insight into the titles, keywords and their densities, descriptions, link counts, and visual elements that are working for them… because they’ll likely work for you, too.
2. Market Research
Any good business owner knows that market research is part of their due diligence when launching, expanding, or changing.
Opportunities. Threats. Trends. Predictions. Collect, organize, and analyze it all.
Classic & Sports Finance – a leading classic and race car finance company – uses scraped auction, sales, and dealer pricing data to keep abreast of market trends and real-time competitive pricing structures. Web data is integral to its business model and success.
A web scraper can extract the necessary data from analytics providers, market research firms, directories, industry blogs or news sites, and collect everything in a single spot. It takes market research from time-consuming and frustrating to quick and simple.
With it, for example, you can organize an extensive list of the direct and indirect competition, or the potential customer base (based on your buyer personas) in a given area, and more.
Speaking of which…
3. Lead Generation
Depending on your business, a lead may be simply a name and contact details for an individual (that hopefully fits your buyer persona). There are many tactics and tools you can try to generate them. Social media, answering questions on Quora, speaking events, conferences, guest posting, paid ads, lead magnets…
And – you guessed it – web data. How?
At its most basic, you’re just looking for contact info that fits a profile. If you have a new cloud accounting SaaS that caters to dentists, you need a list of dentists. If you have a car seat design that’s safer than traditional ones, you need a list of parents with young children.
A scraper can collect the necessary details – names, email addresses, URLs, phone numbers – in a process that’s often called contact scraping. Your dentist SaaS? A state directory of licensed dentists would provide you with a long list of quality leads. Your improved car seat? Try the parent directory at some local schools and day care centers.
All the information you need is available online if you know where to look. Just Google “[blank] directory/association/index/register/club/guild/organization/union” and anything else that would narrow your search down, such as “dentist directory minnesota” or “PTA contact list Lo-Ellen Park High School.”
Another good source is a review site like Yelp.
You can further qualify those leads by searching or filtering the data by keywords, demographics, or any other criteria to find your exact buyer personas.
So it’s not just leads, it’s qualified leads. That’s a goldmine.
Soap bubble entertainment company Bubbly uses web scraping to monitor its competition and market, but also to generate leads and keep a steady flow of prospects heading into its sales funnel.
With contact details available on the web, Bubbly can build its customer base, make the necessary adjustments to its supply chain as demand dictates, and ultimately grow its business without the stress or hassle of having to track down leads on a one-by-one basis.
In Your Competitor Analysis
Any savvy business owner recognizes the importance of keeping an eye on the competition.
What are they doing in their marketing, what content are they pushing, what keywords and phrases are ranking for them (which we already discussed above), what pricing structure are they using, and what’s the general opinion of them from consumers? These are just a few of the questions you should be asking.
To get the answers, you can once again turn to your trusty web scraping tool.
4. Reviews and Sentiment
Scrape from Yelp, Zomato, TripAdvisor, the Better Business Bureau, Trustpilot, Google, Amazon, or some other business review site to see customer reviews and comments about them (and you).
Turn to social media platforms and search by brand or product names to get additional data, and perhaps even leverage sentiment analysis to learn how people feel about certain businesses and products.
Scrape business profiles and corresponding reviews for insight and assistance with reputation management. Profit from the competition’s weaknesses and complaints (offer a better solution), and address your own.
5. Content Approach and Followers
A competitor’s blog and social media accounts are a great place to analyze their content marketing (perhaps opening the door for you to use the skyscraper technique and build off their foundation), as well as to see who has followed or liked them (maybe giving you the opportunity to contact those followers and offer an incentive to make the switch).
6. Price Comparison
You might also scrape for the purposes of price comparison and tracking. What are competitors charging – and what have they charged over time – for the same or similar product? Consumers like to see comparisons between Brand A and Brand B. In fact, 51% of successful campaigns include a comparison or ranking.
Your pricing can obviously make or break you. You need to be competitive. Give it every advantage possible.
Three major grocery store chains – Tesco, Waitrose, and Sainsbury – all use web scraping as part of their pricing strategy. Every morning, they scrape 33 items in the Consumer Price Index food basket, and compare those prices against other items that match the description (so apples in the CPI basket would include granny smith, red delicious, and so forth). The scraper extracts an average of 5000 quotes related to the 33 items each day, or about 150,000 per month.
This allows them to stay competitive and within acceptable pricing standards without having to send someone to manually collect price data points, which used to be standard in the industry.
7. Change Detection
Finally, a good scraper can be set up to detect and scrape website changes. You can keep your finger on the pulse of your competition and know immediately when they have a new product, lower their prices, begin a special promotion, publish a new blog post, or anything else.
With that kind of data at your fingertips, you can react, adjust, and respond quickly and appropriately.
In Your Professional and Personal Life
Web scraping is obviously a valuable business exercise, but it’s not limited to just your professional activities. It can simplify and save you time in your personal life, too.
Web scraping works equally well for business or private purposes.
8. Job Hunting and Recruiting
Looking for a new job? Try scraping dozens or hundreds of the top job portals, sites, and forums. Include social media (search by company or keyword), digital bulletin boards, and classified listings.
Looking to fill a position at your company? Turn to many of the same sources, and filter results with the precise criteria you want and need in an applicant. Scrape a college or university graduating senior student directory in a related field of study.
9. Products and Services
Everyone buys products and services of their own.
As a customer, you can copy and aggregate several directories of services (dentist, lawyer, plumber, contractor) or product providers that you need. Compare reviews, prices, and more to find the best fit at the best price.
Compile a list of used cars that match your requirements from several different sites. Or school options for your kids in a new city. Or anything else that fits a set of criteria you create.
Choosing your next smartphone, for example, can be a major headache because of the massive number of choices: iPhone, BlackBerry, Windows Mobile, or the dozens of Android options.
The website Unmudlr uses web data to make it super easy for you by asking a few basic questions. It then uses its vast array of scraped data – including detailed descriptions and pricing information – to present the phones that meet your specific requirements.
Academic. Professional. Into every life, a little research must fall. Make it quick and painless with web scraping.
Collect info and data sets about any subject from hundreds or thousands of different sources. With billions of articles, case studies, and web pages to choose from, you can expand the scope of your research while you refine your search and save time and money while doing it.
11. Financial Planning
Maybe you have an investment planner, and maybe you don’t. Either way, it’s wise to have an understanding of what’s going on, and to be able to at least provide some input when it comes to decisions about your money.
Scrape data on stocks and bonds (performance over time, expert analysis and predictions), investment properties (rental prices at similar places at or near your location, neighborhood reviews and sentiment), and the companies you or your planner are interested in (sentiment analysis, reviews, industry expectations).
Will it make you a guaranteed million? No. But an informed decision is a better decision. Collect the details you need to get and stay informed.
12. Looking to Buy or Rent
To give you an example of web scraping in a specific industry, consider real estate. The opportunities for a scraping tool to improve the experience of an agent or buyer are many.
As a house hunter, you could create a complete data set of all options available to you from multiple agents and listings, aggregate details in a single location, and fill in the gaps to give you a more complete picture of a particular property, neighborhood, or agent.
13. Looking to Sell
As a real estate agent, you can collect data points on neighborhoods, cities, personal stories, and images to create powerful property listings. You can scour house-for-sale and seeking-house classifieds, contacting buyers, sellers, and renters to offer your services and make their job easier.
As a home or property owner looking to sell or rent out the place, you can scrape similar listings to see the language and key points of interest that others are choosing to highlight. What keywords are they using in the description? Which points of interest do they include for the area?
These are just some of the ways you could use a scraping tool to simplify, enhance, and boost what you’re doing at work and at home.
Ready to get started?
Creating web mashups. Monitoring weather data. Software and app development. And more.
These 13 ways to use web scraping are just the beginning, but should give you some idea as to its usefulness. With data available online by the digital truckload, you need a simple solution to collect and sift through it.
A tool like Import.io allows you to benefit from gaining access to structured web data without having to install anything or learn coding.
Sign up today for a free, full feature, no-risk trial and see for yourself how web data can help you and your business. Sign up, log in, point, click, and scrape.
In a world where nothing is free, we’re giving you free data because we know you’ll immediately see the potential and understand its value.
Try it. You’ll wonder how you ever survived without it.