Import.io User Guide

Handling URLs that don’t change with pagination (“infinite scroll”)


What is “infinite scroll”?

Many webpages consist of content that requires you to scroll the browser window or click Load More to see more information. Some webpages, search results for example, can go on seemingly forever, and thus the term “infinite scroll”.

What’s the problem?

Handling infinite scroll can be tricky because the URL generally remains static (it doesn’t change, even when you’re on a different page). Websites handle this situation in different ways structurally. While it is not always possible to get around the problem, the key is finding the underlying URL pattern for the different pages or pagination, even when the pattern is not explicit in the URL.

Working through an example: Staples category

The http://www.staples.com/Notebooks-Pads/cat_CG3783 webpage consists of both infinite scroll initially and then a LOAD MORE button farther down the page.

To compensate for the infinite scroll, perform the following steps:

Step 1. Opening Chrome DevTools

To find the underlying URL, Import.io recommends using Google Chrome.

  • Open Chrome and navigate to the Staples webpage.
  • Right-click on the page and select Inspect. The DevTools inspector appears.

Step 2. Clearing the Network tab

  • Click the Network tab.
  • Click the Clear icon (next to the red circle near the upper left of the inspector window) to clear any existing activity.

Step 3. Locating the second page of content

  • Click the XHR tab (under the Filter search box), to view the XHR requests.
  • Scroll down the page, displaying more content, until the LOAD MORE button appears.
  • Click LOAD MORE. XHR requests appear in the inspector.

Step 4. Viewing the XHR request header

Step 5. Identifying the page component of the URL

  • In the Request URL, notice pagenum=2. Pagenum is the URL parameter that contains the actual page number of the displayed page.

Navigating to this URL directly skips straight to the second page of content in the underlying data structure.

Now you know how the website really paginates, and thus, how to create your extractor.

Step 6. Creating your extractor

Step 7. Adding the rest of the URLs and running the extractor

etc…

What if I get a JSON response?

If a JSON response appears when you paste a URL in step 6 or step 7, add the following OPT parameter to the end of each URL:

#[!opt!]{"type":"json"}[/!opt!]

What if I run into a POST request or another issue?

If in step 4, the request method of the XHR request is POST (or you run into another issue), contact support@import.io for assistance.