Web data sits at the very heart of your business. You need a single, centralized location where you can supervise, monitor and maintain all of your Web Data Integration projects. The Import.io Data Operations Center (or DOC) is a central point for the different members of your team to collaborate on web data management, transformation, distribution and monitoring. The DOC manages every single aspect of your Web Data Integration journey.
Data Operations Center
Operational management and control of all of your web data
Management & control
Extracted web data is stored in a SQL-queryable data lake. Your web data is always there, easily accessible and easy to integrate to when you need it.
Set the desired arrival time for a dataset and the Robot Scheduler will automatically run your queries. Queries are chunked into small batches to allow for accurate monitoring.
If we detect an increase in error messages or latency from a website, the Speed Governor will automatically slow down web data extraction to something more appropriate.
Calculated fields can be added using spreadsheet-style formulae. Duplicate rows can be removed and datasets entirely transformed using jq.
Web data can be passed between Extractors and other automated workflows in order to enable data-dependent extraction within and across websites.
Create data validation rules and tests to apply to every dataset. Validation rules can be used to control whether a dataset should be pushed into production or goe back for human QA.
Import.io uses unsupervised ML methods upon dataset shape and other statistical dataset properties, over time to detect value drift, anomalous values and other extraction failures.
Create automated QA workflows for human reviewers to evaluate the health of flagged datasets. QA approval processes ensure that only high quality data enters your systems.
Detailed statistics and graphs show live perfromance of your Web Data Integration project. Identify issues before they become errors in your data or failed delivery deadlines.
Get deeper understanding of patterns & trends in your web data by visualizing. Mean, median, mode, distributions, lines of best fit…it all makes more sense when you see it.
Integrate a steady stream of high-quality web data into your applications and analysis tools using our APIs. Everything that you can do in our UI you can do via API.
We can deliver compiled, transformed and tested datasets directly into a file repository. Common dataset formats include: CSV, JSON, Parquet, Avro.
Create notification rules that alert you when there is new web data that meets a certain criteria, when new web data is available or when issues are detected.