The EU’s General Data Protection Regulation (GDPR) applies to Personally Identifiable Information (PII) acquired from the public web. If you intend to collect data from the web about European Economic Area (EEA) residents, then you must design your Web Data Integration (WDI) project with GDPR compliance in mind.

If you are working with on a managed WDI project, we can work with your project and legal teams to help you do this. However, ensuring that your processing and use of the collected PII is GDPR compliant is ultimately your responsibility. For WDI projects where the collection of PII is not the primary objective, offers a feature that will automatically redact PII from your web data to give you confidence that you will not inadvertently become subject to the GDPR.

What is the GDPR?

The European Union’s (EU’s) GDPR outlines certain requirements for the collection, processing and transfer of PII about EEA residents, with the aim of protecting the privacy rights of EEA residents.

What’s the key concept?

Data relating to an identifiable natural person may only be processed and stored if either:

  • The data subject has given consent (very rare/difficult when it comes to web data integration projects).
  • Processing is necessary for: the performance of a contract, compliance with a legal obligation, the protection of life, the public interest, legitimate interests of the data processor.

It is possible that you have a ‘legitimate interest’ basis for collecting and processing the PII of EEA residents. There are a wide range of examples of legitimate interests given by the GDPR – including marketing interests. However, your collection and processing of PII is required to have minimal privacy impact and be such that data subjects would reasonably expect it.

So what counts as PII?

The definition of PII in the GDPR is broad: “…any information relating to an identified or identifiable person”.

This covers obvious information such as name, residential address, social security number etc.; but it also applies to factors that relate to other aspects of a person’s identity that may indirectly identify them. These include physical attributes and even identifying opinions. If indirectly identifying PII is collected from the web in an anonymized way such that it cannot be linked back to an identifiable person, then the GDPR may not apply. However it’s worth noting that if a number of indirectly identifying factors can be combined in such a way that a single individual could be identified from them, then the GDPR would still apply to your processing of that information.

How do I know if the GDPR may apply to my WDI project?

A good starting point is to ask yourself the following questions:

  • Am I collecting web data that relates to people (as opposed to e.g. products)?
  • Are the people that I am collecting web data on potentially EEA residents?
  • Can I extract the data in such a way (for example by not collecting all of it) that it is not possible to identify individual persons from the data?
  • Is there a ‘legitimate interests’ basis for processing PII relating to these EEA residents?

Can you give me some examples?

Not affected by GDPR

  • The data that you are collecting from the web does not relate to identifiable natural persons. For example, you are extracting data on the prices of products, the locations of stores, information about companies etc.
  • You are pulling product reviews written by people but the usernames do not allow you to identify a natural person.
  • You are pulling contact details of businesses, business names and addresses.

Potentially affected by GDPR

  • You are pulling names and contact details of identifiable natural persons.
  • Even if the personal data is on a public website and can be reasonably considered to be in the public domain, if it is possible to identify a natural person from the data, then the processing and storing of such personal data is still subject to the GDPR.

Am I a data processor, data controller or both?

You are likely both a data processor and a data controller when it comes to your WDI project. You are a data processor in so far as you are likely storing and performing processing upon PII obtained from the web via You are a data controller in so far as you are directing to collect web data on your behalf. is a data processor only, as we are collecting PII from the web on your behalf and at your express instruction and direction.