Recipe for an employment “robot”

Last week Fast Company published an article in which I described my experience applying to thousands of jobs using a contraption I modestly refer to as my “robot.” [Editor’s note: to find out how blasting hundreds of applications turned out, please read the Fast Company article].

I use the term robot very loosely, because while it is a very clear descriptor for most people, Import.io users can probably read through the lay-tech language and see a spreadsheet with arms.

No, my bot is not engineered so much as it is constructed. So how is it made? I’ve tried to put together a recipe with the data-savvy in mind.

Ingredients for a job bot:

  • 2–4 Import.io extractors
  • Lots of relevant job sites/lists
  • Google sheets
  • Knowledge of formulas like ARRAYFORMULA, and CONCATENATE
  • Gmail
  • Yet Another Mail Merge (YAMM) or other mail merge service

Import.io Data Extraction

The first part is honestly the easiest thanks to Import.io. You’ll want a handful of extractors that pull data on your behalf.

The sources you select are going to be up to your personal taste. I chose some industry job boards.

While many job boards have clean structured data on their website, I have found that the search pages are often difficult to scrape. Instead, you can setup a saved job search, which emails you results of jobs that fit specific parameters. You are effectively making the website search, sort, and filter the data on your behalf.

Don’t mind if I do! Use the URL here to feed Import.io.

The emails typically have a URL at the top that allows you to open them in a browser. Lucky for us, the HTML for most of these emails is nicely structured. Let these emails stack up in your inbox for a while and you’ll have a handful of these URLs to feed the extractor you’ve built for that particular email format.

Since the emails don’t usually have all of the information you’ll need, you can just focus on the links to the job postings. Gather those together and feed them to another extractor, which processes those URLs and sucks up all of the data from the job posts (which are infinitely more easy than many job site search pages for Import.io to digest). And since the results are already filtered by the saved search, you can dump all of the data right into a Google Sheet for processing and parsing.

Spreadsheet to Google Sheet

Now that we’ve got a spreadsheet filled with data, it may need some level of parsing. You’ll want to push the data from each job site into it’s own sheet. Here, you can decide which data is relevant to you and ignore the rest. I pass the raw data to another sheet for cleanup.

In the cleanup sheet, the columns are standardized so the data can all flow into one hopper later on. You may also need to do other types of cleanup here. The most obvious cleanup involves extracting the email address of the hiring manager from the last paragraph where it normally resides. There are a lot of smart ways to do this. I chose to use one found on stack exchange (thanks to nixda):

=iferror(Regexextract(A1;”[A-z0–9._%+-]+@[A-z0–9.-]+\.[A-z]{2,4}”);””) 

At this stage, if no more cleanup is needed, you’ll pass all of your standardized and tidy data over to the aggregator sheet. There, all of the targets are gathered together for one last tidy-up: deduplication. It would be awkward if multiple job sites had the same posting and you apply to the same job several times, right? From here, a couple of concatenations followed by a =UNIQUE(range) will ensure that only rows with a unique combination of company and title will be readied for delivery.

Google Sheet to Email

Now it is time to code jobs for customization. I decided to personalize cover letters with the hiring manager’s name, the job title, and the company. An early sample of a cover letter format can be seen here:

YAMM actually uses {{Column Header}} syntax but this was before I realized that.

You’ll save this template as a draft in gmail. The markers (use this syntax: {{Column Header}}) will be replaced by the data in your spreadsheet to send personalized emails. Because you’re using gmail, you can add inline images and attachments.

You can establish some metadata rules early on that would associate certain job titles with relevant successes, specific skills to be listed, and roles that demonstrate experience. Adding columns that search for keywords in the title will ensure that (for example) operations job post get an operations cover letter and communications job posts get a communications cover letter. If you have customized resumes you can run multiple YAMMs (use a different template with the relevant resume attached) or use a column with a URL that points to the relevant resume.

Since you’re not going to spend time customizing each and every cover letter, you should spend a lot of time on every last detail of the cover letter template and modular text tidbits. Pay attention to the flow from static text to dynamic text, making sure the spacing, verb tense, and singular/plural indicators are all consistent.

At this stage, you should probably (ok, definitely) try a few test runs. In my case, I used dummy data with a bunch of gmail aliases to do this. Nonetheless, the first time I fired it up I accidentally applied to about 1,300 jobs in the Midwest during the time it took me to get a cup of the world’s most delicious coffee. I live in New York City and had no plans to relocate, so I quickly shut it down until I could clean up the bug (turns out one of my extractors was TOO good).

Et voilà! You’ve concocted a robot that applies for jobs. With some creativity and tweaking, this can be used for all sorts of things. Since many of those things are obnoxious or unproductive I’d urge you to think long and hard before you unleash the beast.

A final note: This started out as a cool idea, turned into a fascinating project, and wound up a failure. Very early in the process I understood that this was not going to increase my chances of getting a new job but it would help me to understand some of the weaknesses in the recruiting process. By digging in and looking at the data, I’m hoping to share some insight into a flaw that doesn’t prevent employers from hiring good employees, but definitely prevents some good potential employees from being considered.

Extract data from almost any website


INSTANT ACCESS