User Guide

Adding training URLs

This topic describes how to improve the accuracy of an extractor by using additional webpages to train the extractor.

Note: A different topic covers adding multiple URLs to your extractor, rather than adding URLs to train your extractor.

Training with just one webpage is often sufficient to train your extractor. However, sometimes the underlying structure of webpages on a website varies, even when the webpages look identical. Adding additional training URLs enables you to check your training against similar webpages. recommends adding training URLs to identify structural variations between webpages.

Providing training URLs

Providing additional URLs to the editor for training improves the accuracy of the extraction and increases the success rate of your extractor by allowing you to review and refine your training. Typically adding three to five URLs is sufficient, depending on how much the webpage structure varies.

To provide URLs for training, perform the following steps:

  1. In the editor commands bar advanced options, click Train with additional URLs

The Manage Extractor URLs dialog box appears.

  1. In the Add training URLs box, add a URL of another webpage with similar structure from the same website.
  2. Click Go. analyzes the new URL against the existing training and a completion message appears.
  3. Click Save and Close. A new option appears on the editor commands bar giving you access to each of the training pages. A checkmark on the dropdown list indicates the page you are viewing.

Checking training data for correctness

To check your data for correctness, perform the following steps:

  1. Click through the column headings and check the floating column window for incorrect or missing data points.
  2. Use the point-and-click interface to correct any errors.
  3. When working with single-item pages, consider using the single row option to collapse the data into just one row for each training URL. The data appears in the floating column window with page labels.

In the data view, all the data for each URL appears simultaneously.


When errors occur, provides error messages to help diagnose the problem.