How to fix a Publish Request Failure

Ever get to the end of building your API and see the dreaded Publish Request Failure (PRF) message? We feel your pain! There are a couple of reasons for API publish request failures, but the most common by far is timeouts due to JavaScript.

API Publish Failure API Publish Failure

Our new API docs include some exciting beta features to help get around sites with heavy JavaScript. But, we can’t let devs have all the fun – so if you don’t know your GET from your POST, here’s a quick hack to get all the same functionality without needing to know any code.

The Problem with JavaScript

JavaScript is a total pain when it comes to extracting data. Take this page for example:

JavaScript on JavaScript on

Looks simple right? But actually, underneath, this page has a whole lot of bad happening. Here’s that same page with JS turned off:

JavaScript Off JavaScript Off

Yikes!

Even if you turn JS back on (so you can map your data), processing it takes a long time. And if it takes too long, your API will timeout causing a publish failure.

Unfortunately, there’s not much you (or even we) could do about it….until now that is.

JavaScript Re-render

We’ve developed a way to re-render pages with JS and transform them into HTML sites. If you put render.import.io/?url= before your site URL like so…

And then use that URL when training data, we’ll try to re-render the site and create a new HTML site which we host temporarily on our servers (just long enough for you to get your data). Here’s that same site from before, once it’s been passed through render.import:

JavaScript off once it's been re-rendered JavaScript off once it’s been re-rendered

Looks just the same as the JS on version, but underneath it’s all HTML. That means we can map the data we want and publish the API – we can even run it through Magic…

Note: Make sure you include the http or https as part of your URL when using render.import

Combating Infinite Scroll

The other bad thing about this site is that it has infinite scroll. And not the nice kind of infinite scroll where there’s a button we can click and find out the URL pattern for more pages. The kind of infinite scroll that just loads more results as you scroll.

IF the infinite scroll is controlled by JavaScript, you can put your URL into render.import.io like before and add &inf=X (X being the number of pages you want to scroll) after it. If you wanted 10 pages, it would look like this:

This will re-render your site in HTML and programmatically scroll through the number of pages you asked for (in this case 10). At the end of if you get one reaaallyy long page which you can put into Magic.

There’s no official limit on the number of pages you can scroll though, but for each page you add a load time of about 1 second. So if you ask for 50 pages, it’s going to take a little while to render.

Note: &inf= only works if the infinite scroll is done with JavaScript and doesn’t require a click

Learn more about JS render and infinite scroll here.

Tell us what you think

All this awesome technology is in the process of being refined and built into our product, but in the meantime we thought you guys should be able to play around with it and get data you couldn’t before. Please bear in mind that both of these beta features are being hosted on a small AWS server and are therefore less stable than our core product, but hopefully they can help with some edge cases and those pesky API publish failures cause by javascript.

Want to help us test new features? Sign up to get a first look at upcoming features before they’re released!

Comments

JavaScript Re-render almost working well
for unknown reason , when i use the extractor tool and bulk extract URL’s , the data i’m training him to extract there are cases he extracts them well and other cases not.

On TripAdvisor, the dropdown menus do not function anymore when using the re-render tool.

Comments are closed.

Extract data from almost any website


INSTANT ACCESS