top of page
  • Coffee With Arun

Data Protection giving sleepless nights? AI can help !

Updated: Jul 24, 2020

Are you heading an IT department that deals with personal data from your customers, employees, suppliers or partners? 

Data Protection - Photo by Markus Spiske from Pexels

If the answer is yes, it is very likely that you are ill-equipped to protect that data. Additionally, if you deal with European Union citizens data, your organization faces very heavy fines in case of non-compliance with the new General Data Protection Regulation (GDPR) that became effective on 25th May 2018.

A recent survey from Trend Micro shows that less than 50% of Tech companies they surveyed are prepared to handle data protection. Another study reported that one third of Investment Firms had not even taken the first step towards protecting personal data. One fourth of marketeers are ill prepared, reported a marketing portal

Most organizations are finding it extremely difficult to even identify occurrences of personal data in their IT systems, protecting it comes much later.

The scenario is quite similar with countries across the globe, with APAC countries taking position towards the bottom of the pack.

Why is Data Protection suddenly do so important ?

Several countries including UAE, Singapore, EU countries and many others have had some form of data protection regulation in place. However, the downside on violations wasn’t heavy and most of these remained a good-to-have item on a Chief Compliance Officer's (CCO) agenda rather than a must-have. 

Some countries such as India are now gearing up to have stronger laws triggered by recent data leak from their Aadhaar national id system.

Recent data leak at Facebook also has put everyone's attention to the importance of data protection.

The key factor catalyzing heightened activity in this space is, however, the stringent norms and heavy penalties that GDPR enforces for all organizations dealing with European citizens' data:

  • High penalties : for non-compliance: Upto 4% of annual global revenue for an organization

  • Expanded Definition : Personal Data to be protected now includes anything that can be used to identify a person. While this could be name, email, address, biometrics genetic data as before but new types of data such as IP address, age, profession, location, gender and host of other entities have been included.

  • Right to Erasure: there are several rights now placed in the hands of the person whose data is being protected. These include her right to query an organization about what personal data is being stored by them, for what purpose and with whom that data has been shared. Most crucial of these rights includes the right to erasure. It indicates that a person can ask her personal data to be permanently removed from an organization's systems.

The Challenge

It may not be obvious at first glance but this is a huge task for any organization.

Such data to be protected could occur not only in structured databases, but in unstructured text such as in informal communication with your partners or through online conversations with your customers or documents submitted by vendors etc.

Combing through all of the streams of massive unstructured or structured data coming in and ensuring all personal data is identified and tagged for querying and protection is indeed a tall order. This is where everyone is struggling.

On top of that, GDPR states that for any data related request an organization is liable to respond within 30 days, extendable for some complex queries.

How can AI help ?

Machine Learning (ML) techniques combined with Natural Language Processing (NLP) techniques have recently been used to offer automation of sensitive personal data identification and protection. Data Loss Prevention software are an example of such systems deployed and made available by top tech companies like IBM, Google and Amazon.

General purpose data discovery falls in the area of research known as Fine Grained Entity Typing. It deals with detecting presence of various kinds of entity mentions in textual data and assigning them fine grained types. For instance, labeling Apple in a tech article as an 'IT Company' rather than as a 'Fruit' or a coarse grained type such as 'Organization'. Academic and industrial researchers are making continuous advancement in the use of Knowledge bases and Neural architectures for getting better at detecting and typing entities.

Ever new challenges

While GDPR is a huge step in data protection, with never stopping technological advances we shall continue to face and deal with new challenges. Recently, Amazon Echo - the personal assistant device from Amazon, recorded and sent a private conversation to someone on the contact list.

It is becoming evident that even as the industry struggles to meet personal data protection in textual domain, we are already seeing new frontiers opening up for similar challenges in other modes of communication.

DISCLAIMER: All views expressed above are the personal views of the author.

This article was originally published at

48 views0 comments

Recent Posts

See All


bottom of page