The anonymization make it impossible to identify a person from a series of data and allow the respect of privacy. The CNIL makes the point of which kind of techniques can be used and their challenges.
What is anonymization?
The anonymization is a process which allows the use of a series of techniques in order to make impossible, practically, to identify the person with any other way and irreversibly.
The anonymization must not be confused with the pseudonymization.
The pseudonymization is the personal data process done in order that data related to a specific natural person are not be assigned without any other informations.
Practically, the pseudonymization consists in the substitution of identificative direct data (surname, name, and so on…) of a set of data by using the indirect identification of data (alias, progressive number…)
The pseudonymization allows to process data of individual without be able to identify them directly. Practically, anyway, is sometimes possible to find out the own personality by using data of third parts: the concerned data still be personal. Even if the pseudonymization operation is reversible, in contrast with the anonymization.
The pseudonymization is one of the recommended measures by the GDPR in order to limit risks related to personal data process.
Why anonymize data?
The General Data Protection Regulation (GDPR) does not include a general obligation of this anonymity. this is a solution, among the other things, in order to use personal data in respect of rights and freedoms of person.
In fact, the anonymization opens the potential for the reuse of data, at the begin prohibited,, because of the personal nature of data used and allows to actors to takes advantages and shares its “pool” of data without any personal data breaches.
It allows also to store data beyond the retention period.
In this case, the data protection legislation is not more applicable, because the diffusion or the reuse of anonym data has not any impact on privacy on data subjects.
How to make anonym by preserving as much as possible the set of data utility?
Because the process of anonymization aims to erase any kind of possibility of re-identification, the future exploitation of data is limited to specific types of use. This restrictions must be kept in mind since the beginning of the project.
In order to create a relevant process of anonymization is advisable to:
- identificate informations that need to be stored based on their relevance.
- remove elements of direct identification, like uncommon values which can allow an easy re-identification of people, (for example: the age presence allow to re-identify easily the centuries-old);
- differentiate important informations from secondary or not necessary ones (that are not erasable);
- define the ideal and acceptable finesse for any kind of informations stored.
This prerequisite permits to determine the process of anonymization to apply, so that the anonymization techniques scheme that need to be adopted.
This can be grouped into two families:
randomization and generalization.
- The randomization consists in modify the attributes into a set of data in order that are less precise, while maintaining the overall distribution. This techniques protects the set of data from the risk of interference (see below).
For example: is possible to exchange data related to data of birth of people in order to distort the information veracity included into a database.
- The generalization consists in modify the attributes of a set of data of the scale, or their dimension, in order to guarantee that are common to a number of person. This technique permits to avoid the individualization of a set of data. It limits also the possible correlation of the set of data (see below).
For example: into a file included the data of birth of people, is it possible to change this informations with the unique age of birth.
How to verify the efficiency of anonymization?
The European Data Protection Authorities define three criterias with permits to guarantee that a set of data is truly anonymous:
1. individualization: it should not be possible to isolate into a set of data;
Example: a database CV i which only the name and the surname of a person are substituted by a number (which matches only to them) permits to personalize this person. In this case the database is considered pseudonymization and not anonymous.
2. Correlation: it should not be possible to connect mix of separate data about the same person.
Example: a cartographic database which offers people home address can not be considered anonymous if the other databases, which exist elsewhere, include this same address with other data which permits to identify data subjects.
3. Interference: it should not be possible to deduce, with any certainty, new informations about data subject.
Example: if a set of data in seemingly anonymous includes informations on the amount of taxes that people have replied to a form, in which all the men with ages between 20 and 25 have replied that are not taxable, it will be possible known that the Sir. X, a man of 24 years old, answers to the form, that he is tax-free.
How to protect from risks of anonymization association?
If these three criterias are not fully complied, the data processor which wants to anonymize a set of data needs to demonstrate, by using an accurate identification risk assessment, that the risk of re-identification with reasonable ways is zero.
Because the anonymization and re-identification techniques are necessary for the regular evolution, is essential that any data processor does a regular monitoring in order to preserve, during time, the anonymous nature of data produced.
This monitoring needs to keep in mind technical measures available and other data sources that can allow the deletion of the information anonymization.
If a set of data published online like “anonymous” includes personal data and none of the exception of which the article L321-1-2 of the Code of relations between the public and the administration (CRPA) is not applicable, can be considered like data breach. Is necessary to:
- remove the set of data as soon as possible;
- inform the CNIL if it is probably that this breach can cause risks for rights and freedoms of people;
- inform data subjects if this risk is high.
SOURCE: AUTORITA’ PER LA PROTEZIONE DEI DATI DELLA FRANCIA – CNIL