We take customer privacy seriously at, both for our own collection of Personally Identifiable Information (PII) as part of the usage of the product and when providing our customers with the tools to eliminate PII from their data extracts if they do not want to capture that data.

What is GDPR?

The General Data Protection Regulation (GDPR) (Regulation (EU) 2016/679) is a regulation by which the European Parliament, the Council of the European Union and the European Commission intend to strengthen and unify data protection for all individuals within the European Union(EU). It also addresses the export of personal data outside the EU. The GDPR aims primarily to give control back to citizens and residents over their personal data and to simplify the regulatory environment for international business by unifying the regulation within the EU. When the GDPR takes effect, it will replace the data protection directive (officially Directive 95/46/EC) of 1995. The regulation was adopted on 27 April 2016. It becomes enforceable from 25 May 2018 after a two-year transition period and, unlike a directive, it does not require national governments to pass any enabling legislation, and is thus directly binding and applicable.

What is personally identifiable information (PII)?

In GDPR terms, ‘personal data’ means any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person.

Personal data is any piece of personal information that can be used to identify an individual and includes, but is not limited to, the following:

Full name, Home address, Email address, Social security number, Passport number, Driver’s license number, Credit card numbers, Date of birth, Telephone number, Log in details

Linkable information, on the other hand, is information that on its own may not be able to identify a person, but when combined with another piece of information could identify, trace, or locate a person. Here are some examples of linkable information: First or last name (if common), Country, state, city, postcode, Gender, Race, Non-specific age (e.g., 30-40 instead of 30), Job position and workplace.

How does collect and treat PII?

We use industry standard practices for handling customer data. A limited number of employees have access to customer PII data via access controlled mechanisms. Technical operations employees have access to the raw service data storage. This access requires authentication via public key or two factor authentication. All other employees are prohibited from accessing customer data. collects and stores other PII as part of its marketing programs. For full details, please refer to our security and privacy policies.

How does the product react to collecting PII?

Recognizing the need for privacy, we provide a series of capabilities to allow customers to control their data collection. However, it is the responsibility of our customers to ensure that they are using our product in compliance with the prevailing laws.

Robots.txt – the main purpose for robots.txt is to allow web site owners to decide which pages they wish web crawlers (e.g., search engines) to index and which pages to ignore. Even though it is not meant to direct web extractors to which pages they can/cannot extract data from, is developing the capability for customers to choose to follow the direction of robots.txt on web sites. This capability has been available since January 2018. Contact at for information on how to access this capability.

PII Redaction – is developing a capability to optionally detect and redact PII if our customer chooses to do so. This allows a customer to ensure that they are not “inadvertently” collecting PII while using on a data collection project. This capability is expected to be released by the end of February 2018, well in advance of the GDPR becoming law in the EU.

Include/Exclude – is also developing the capability to allow customers to specifically include or exclude web domains based on their policies. This capability is expected to be released by the end of March 2018.

Why doesn’t automatically detect and remove PII?

Some customers may wish to use to collect PII that they are duly authorized to do so. For instance, an HR team in a company may wish to extract employee names and email addresses from an intranet and then compare against the HR database to look for anomalies.

As evident in the differences between personal and linkable information, detecting PII, and sometimes even determining if a piece of data is really PII, can be subjective. For example, when “Mark” is used in a name field, it is linkable information, but when “Mark” is used as a common noun or product name (e.g., Cannon EOS 5D MarkIV), it is not linkable information or PII. Similarly, US phone numbers can look like product SKUs. Hence, whether a certain piece of data is PII is dependent on the context and source of the data. With our product, our customers can clearly delineate PII from non-PII data. With these guardrails, our customers can ensure that PII is consistently removed, and that non-PII data is reliably extracted.