Today’s webinar is all about my favorite thing – Football (or “Soccer” for our American viewers).
With the World Cup in full swing, I thought it would be fun to show you how Chris created this ultimate football stats web app which updates live.
First I showed you how to build an Extractor to the BBC World Cup Results page. Extractors are great for getting data from one page because it lets you refresh it whenever you want – an important feature when getting sports data, which changes all the time.
This page has multiple matches on it, so we needed to map it as a multiple rows extractor. Once we’d added our columns it was a simple matter of uploading it to import.io, and the API is created for you automatically. From the Dataset page, you can hit refresh whenever you want and get the latest data pulled directly from the source.
Next, I built a Crawler to the Fifa website to get the data on all of the teams participating in this year’s World Cup. A quick tip that I use when building a crawler is to have the five pages I want to train the crawler on open in Chrome before I start so I can just copy and paste the URLs into the import.io app.
Because these pages are profiles on a single team, I used the single row option. This means I can jump straight to training columns. When you’re training your columns, make sure you take advantage of our different column fields – they’re especially helpful if you plan on integrating your data later on.
Once you’ve finished mapping your columns, all you have to do is upload it to our system and run the Crawler. We set the defaults for you, but you can use the advanced settings for quicker or more targeted crawling.
Now because FIFA has all the teams that could possibly be in the World Cup, I ended up with more data than I really needed. So I decided to combine an Extractor with a Crawler. First I built a quick Extractor with the links to to the teams which are in the world cup. Then I downloaded that Extractor as an Excel file. Then I copied all the links and pasted them into the “Where to Crawl” box in the Crawler I built earlier and re-ran it. This time it only pulled data from the teams which are in the World Cup.
Now, here at import.io we’re not just a one sport office, so Chris showed you around his Wimbledon web app including some cool visualizations he made from Crawlers and the live Tennis stats he got with an Extractor.
To round off the webinar I showed you how to build a Connector to some player data on the UEFA website. Connectors are handy for when you want to interact with a website, like typing in a search box, to get to the data you’re interested in. You can also record clicks in addition to searching. So for this connector I recorded myself searching for a player and clicking on the first search result to take me to the player’s page.
Again, because we are looking at a single player’s page, we need to use the single row option. Anytime you use this option, no matter what you’re building, you will need to give us five example pages so that we can make sure we understand your data mapping.
Your Turn
We’ve made all our APIs for both the World Cup and Wimbledon apps available to you, and now we want to see what you can build with it. Tweet us the link and you could win an awesome tech gadget!
Chris has also made available all the code he used to create each app on the import.io Github page.
Question Time
How did you get the data to display on the map?
For the maps, Chris used the Google visualization API. You can see just how he did that and the other visualizations in the “How it Works” section of each of the visualizations.
How do you grab the data from a page, when it’s not duplicated on each row?
If you want to get data that is displayed on the page, but not within each row, you can use our manual XPath override function under “Advanced Column Settings”.
What are the limitations of enabling Javascript?
The reason we try turning JavaScript off when you first detect optimal settings is because it makes the API creation much quicker. In most cases though, turning JavaScript back on shouldn’t affect our ability to create you an API. Even our Crawler can access JavaScript – which is the only crawler we know of with this functionality!
Does import.io work on intranets?
Provided that your intranet will allow you to download the app, you should be able to use it to create static data sets. Because of the security levels in intranets, we won’t be able to query that data over an API.
Is there a way to automatically get data from a xml/csv/text file linked in the html page?
At the moment, we are not able to extract data from XML, CSV or Text files. The best thing to do if you need data from inside this file type is to extract the link to it and then download it to your machine.
Do you support flash websites?
The nature of flash websites means that we are not able to create APIs for them.