Web Data Extraction Glossary | Import.io Terms & Definitions

An API is a structured interface that allows systems to programmatically request and exchange data. APIs enable automated workflows, integrations and reliable delivery of external data into internal tools. They act as the bridge between Import.io datasets and enterprise analytics environments.

The full range of products a retailer or brand offers at a given time.

Breadcrumbs (Data Context)

A structured trail that shows where data came from on a page, helping with navigation and classification.

Automated Extraction

Automated extraction is the scheduled and repeatable collection of data from websites without manual intervention. It ensures structured external data is updated consistently for analytics, pricing intelligence and reporting.

Baseline price is the regular, non promotional price of a product. It serves as the reference point for measuring discount depth and promotional activity. Understanding the baseline price helps teams evaluate the effectiveness and frequency of promotions.

Benchmarking is the comparison of prices, assortments, content or performance metrics against competitors or the wider market. It provides essential context for commercial decision making and helps organisations understand their relative position.

Category Management

Category management focuses on pricing, promotions, assortment and performance within a specific category. It relies heavily on external data to understand market shifts, competitor activity and consumer behaviour.

Chaining links multiple extractors together, using the output of one as the input for another. This enables multi step data collection processes such as going from category pages to product detail pages.

Availability indicates whether a product is in stock, out of stock or available in limited quantities. Availability changes influence pricing, search ranking and conversion. Monitoring stock status across retailers helps organisations detect supply pressure and competitive opportunity.

The buy box is the primary purchase option shown on marketplaces such as Amazon. Winning the buy box increases conversion and visibility, while losing it often signals pricing, availability or fulfilment issues.

Competitive Intelligence

Competitive intelligence is the use of external data to understand competitor pricing, availability, assortment, content and promotional activity. It helps organisations respond quickly to market movements and supports evidence based strategy.

A crawl is the automated process of loading and scanning webpages to collect data. Crawling forms the foundation of large scale extraction and monitoring across websites and retailers.

A crawl run is a single execution of an extractor across a set of URLs. Each run captures the latest state of pricing, availability, content or other structured data.

A CSS selector is a pattern used to identify elements on a webpage, such as price, product title or image source. Extractors use selectors to reliably locate data in the page structure.

Compliance Filters

Compliance filters prevent the extraction of restricted or sensitive information. They enforce governance and regulatory standards across data collection workflows.

Data Extraction

Data extraction is the automated process of collecting structured information from websites. It supports pricing, digital shelf, competitive intelligence and market analysis by providing real world, high frequency data.

A data feed is a structured output of extracted data delivered via API, CSV, JSON or file transfer. Feeds provide clean, ready to use data that slots easily into downstream workflows.

Data Governance

Data governance refers to the frameworks, rules and processes that ensure external data is sourced, processed and used responsibly. It supports compliance, ethics and operational integrity.

A data pipeline is the full sequence of steps that collect, transform, validate and deliver data. It ensures that external data moves predictably from source to insight without manual intervention.

Data quality measures the accuracy, completeness and consistency of a dataset. Poor quality data leads to unreliable insights. High quality data improves confidence and decision making.

A data report summarises the outputs of a crawl run and highlights key changes or patterns. Reports help teams quickly review market movements and share insights.

A details page is an individual product page containing information such as title, price, attributes and stock. Extracting details pages provides granular understanding of each SKU.

The digital shelf represents how products appear across online retailers. It includes pricing, content, availability, images, search ranking and reviews. Monitoring the digital shelf helps brands understand their market presence.

DOM (Document Object Model)

The DOM is the structured representation of a webpage used by browsers. Extraction tools use the DOM to locate and extract the correct elements on a page.

ETL (Extract, Transform, Load)

ETL is the process of extracting data, transforming it into a usable structure and loading it into analytics or storage systems. It enables consistent, high quality data for reporting and modelling.

External data refers to information collected from outside the organisation, such as competitor pricing, product details or availability. It provides essential context for decision making and market understanding.

An extractor is a single Import.io web crawler with its own configuration. It defines what data to collect (a set of selectors) and how to collect it using a tailored schema and rules that produce consistent structured output.

Feed delivery is the process of delivering extracted datasets to users or systems, typically via file export, API or scheduled delivery. It ensures stakeholders have timely access to the data they need.

Fully Managed Service

Fully Managed Service means Import.io handles extraction, monitoring, maintenance and delivery on behalf of the customer. It removes technical overhead and ensures consistent, governed data at scale.

Governance defines how data is sourced, processed and used within an organisation. It ensures external data meets compliance, quality and ethical standards.

Headless Browser

A headless browser loads webpages programmatically without displaying them visually. It is used to extract dynamic or script driven content that standard crawlers cannot access.

HTML is the markup language that structures webpage content. Understanding HTML is essential for identifying elements during extraction and ensuring accurate data capture.

Ingestion is the process of loading external data into internal systems such as BI tools, dashboards or data warehouses. It allows organisations to enrich internal analytics with market context.

Insights are actionable findings derived from analysing data. They translate raw data into context that informs decisions, strategy and performance improvements.

JSON (JavaScript Object Notation)

JSON is a lightweight format for storing and exchanging structured data. It is widely used for APIs, data feeds and machine to machine communication.

Keyword Ranking

Keyword ranking indicates where a product appears in retailer search results for a specific keyword. Search visibility is a major driver of digital shelf performance and conversion.

Lineage documents where data comes from and how it has been processed over time. It improves transparency and supports compliance across data workflows.

A listings page shows multiple products, typically in category or search results. Scraping listings pages provides broad market coverage and is often the first step in competitive analysis.

MAP (Minimum Advertised Price)

MAP is the lowest price a retailer is allowed to advertise for a product. Monitoring MAP compliance helps protect brand value and margin integrity across markets.

Market Intelligence

Market intelligence uses external data to understand competitors, category trends, promotions and pricing behaviour. It gives organisations full visibility of the market landscape needed to guide strategic planning.

Monitoring is the continuous tracking of changes in price, availability, content and ranking across retailers. Monitoring ensures teams detect meaningful changes quickly and respond effectively.

Normalization is the process of standardising extracted data into a consistent structure and format. Normalization removes variations in naming, units and value types, making datasets easier to compare and combine. It is essential for building reliable data pipelines, especially when collecting data from multiple retailers or regions.

Offer price is the current selling price of a product. It may include temporary discounts, promotions or loyalty incentives. Offer price changes frequently across retailers, making it a core metric for pricing intelligence, competitive benchmarking and promotional analysis.

Out of stock indicates that a product is unavailable for purchase. OOS events can impact conversion, search ranking, revenue and customer satisfaction. They can also shift demand to competitors. Monitoring OOS rates helps organisations understand market pressure and supply chain performance.

Price Intelligence

Price intelligence is the systematic tracking and analysis of competitor prices and promotional activity. It helps organisations maintain margin, respond to market shifts and plan pricing strategies. Price intelligence relies on external data to provide a clear and current view of competitive position.

Price index compares a product’s price against a market or competitor baseline. It provides a simple measure of competitiveness across categories and regions. Tracking price index helps organisations understand if they are priced above, below or aligned with the market.

Product data includes titles, prices, descriptions, specifications, ratings and other attributes. It is used to understand competitive positioning, track content accuracy and monitor digital shelf performance. High quality product data provides a complete view of how items appear across retailers.

A promotion is a temporary offer designed to influence demand, such as a discount, bundle or limited time price reduction. Promotions have a direct impact on sales, market share and competitor behaviour. Tracking promotional activity helps organisations plan more effectively and measure ROI.

A query is a single request or URL processed during extraction. Queries are the operational unit used to measure usage, performance and cost. Query efficiency helps determine the scalability and stability of data pipelines.

Quality score evaluates the accuracy, completeness and consistency of a dataset. It is used to determine whether external data is reliable enough for reporting and analysis. High quality scores improve trust and reduce downstream issues.

Ranking refers to where a product appears in retailer search results. Higher rankings correlate with higher visibility, click through rates and conversion. Ranking is influenced by price, availability, content quality and reviews.

Reviews and Ratings

Reviews and ratings reflect customer sentiment and product satisfaction. They influence visibility, conversion and brand perception. Monitoring reviews helps organisations understand consumer trends and identify issues early.

A schema defines the structure and organisation of a dataset. It ensures consistency across sources and simplifies downstream use. Clear schema design is critical for automation, transformation and reporting.

Scraping is the automated process of collecting structured information from websites. It provides real world, high frequency data that supports pricing intelligence, availability tracking, competitive benchmarking and digital shelf analysis.

Self Service Plan

The Self Service Plan allows customers to build, maintain and run extractors independently. It gives technical teams flexibility and control over data pipelines while benefiting from Import.io’s tools and platform reliability.

Share of shelf measures how visible a product is compared to competing products across retailers. It indicates how prominently a product is positioned within a category and influences overall performance.

Stock level indicates how many units of a product are available for purchase. It is used to track supply chain health, marketplace competitiveness and demand patterns.

Training is the process of teaching an extractor to recognise the data elements on a webpage. It involves selecting target elements and defining rules for extraction. High quality training improves accuracy and long term stability.

Transformation converts raw scraped data into structured, consistent and analysis ready output. It includes cleaning, formatting, enrichment and normalisation.

A URL is the address of a webpage used during extraction. URLs guide crawlers to the correct sources and determine the structure of a crawl or pipeline.

A user agent identifies the type of browser or device making a request. Using controlled user agents helps ensure webpages load correctly for extraction.

A variant is a specific version of a product such as size, colour or pack count. Monitoring variants helps build a clearer picture of pricing, availability and market behaviour.

Visibility reflects how prominently products appear across digital shelves. Higher visibility correlates with stronger performance.

A web crawler automatically loads pages to discover and extract data. Crawlers are essential for large scale data collection.

Web Scraping Service

A web scraping service provides structured external data without requiring internal scraping infrastructure. It supports large scale data collection with reduced operational effort.

XML is a structured markup language used for data exchange and sitemaps. It is widely supported across enterprise systems.

Yield measures the percentage of successful extraction results in a crawl run. It is a key indicator of extraction health.

Zero Party Data

Zero party data is information intentionally provided by users, such as preferences or profile details. It complements external data by offering direct insight into customer intent.

Learn how AI is transforming web data into enterprise intelligence.

Talk to an Expert See in Action

Trusted by pricing and e-commerce leaders