Data Anonymization Part 2 : Differential privacy

We introduced data anonymization in a previous article. Why is this important? In addition to the reasons mentioned in the article, it allows compliance with the regulations set by various government bodies (California Consumer Privacy Act, General Data Protection Regulation for the European Union). The important characteristics to be observed are the following:

• It is not possible to isolate an individual in the database

• It is not possible to link several databases

• It is not possible to make any inference or prediction about future behavior.

The last constraint is difficult for a company that works in data science. Machine learning is the core of its business! This nevertheless provides us with a useful framework. Now let us move on to the different ways to transform data:

Pseudo-anonymization

By using a hash key (non-reversible result), we obtain the same result for the same values as input. It is possible to make correlations between data points.

Pseudo-anonymization example on right side

Anonymization

The most advanced situation in anonymization. Correlation between data points is no longer present

Anonymization example on right side

Suppression /Masking

The user chooses the quantity of information to keep.

Suppression /Masking example on right side

Generalization

This method generalizes information about specific individuals.. It is used in the k-anonymity algorithm.

Generalization example on right side

K-Anonymity algorithm

This algorithm tries (if a solution to the optimization problem is possible) to transform input data with respect to the following condition: According to the important variables to anonymize (Quasi-Identifiers in the literature), it is not possible to distinguish an individual from k-1 others.

The higher the choice of K, the more secure your data will be. As a drawback, the quality of analysis will be reduced.

At ExplorAI, your data will be safe with us. We use an in-house tool that combines the different techniques presented in this article.

You may ask yourself what are the impacts of these techniques? We tested them on public data. It is the subject of an upcoming article!

If you have any question, feel free to reach us!

The research done on anonymization has been made possible thanks to the support of Mitacs through their Business Strategy Internship program.