Recent from talks
Knowledge base stats:
Talk channels stats:
Members stats:
De-identification
De-identification is the process used to prevent someone's personal identity from being revealed. For example, data produced during human subject research might be de-identified to preserve the privacy of research participants. Biological data may be de-identified in order to comply with HIPAA regulations that define and stipulate patient privacy laws.
When applied to metadata or general data about identification, the process is also known as data anonymization. Common strategies include deleting or masking personal identifiers, such as personal name, and suppressing or generalizing quasi-identifiers, such as date of birth. The reverse process of using de-identified data to identify individuals is known as data re-identification. Successful re-identifications cast doubt on de-identification's effectiveness. A systematic review of fourteen distinct re-identification attacks found "a high re-identification rate […] dominated by small-scale studies on data that was not de-identified according to existing standards".
De-identification is adopted as one of the main approaches toward data privacy protection. It is commonly used in fields of communications, multimedia, biometrics, big data, cloud computing, data mining, internet, social networks, and audio–video surveillance.
When surveys are conducted, such as a census, they collect information about a specific group of people. To encourage participation and to protect the privacy of survey respondents, the researchers attempt to design the survey in a way that when people participate in a survey, it will not be possible to match any participant's individual response(s) with any data published.
When an online shopping website wants to know its users' preferences and shopping habits, it decides to retrieve customers' data from its database and do analysis on them. The personal data information include personal identifiers which were collected directly when customers created their accounts. The website needs to pre-handle the data through de-identification techniques before analyzing data records to avoid violating their customers' privacy.
Anonymization refers to irreversibly severing a data set from the identity of the data contributor in a study to prevent any future re-identification, even by the study organizers under any condition. De-identification may also include preserving identifying information which can only be re-linked by a trusted party in certain situations. There is a debate in the technology community on whether data that can be re-linked, even by a trusted party, should ever be considered de-identified.
Common strategies of de-identification are masking personal identifiers and generalizing quasi-identifiers. Pseudonymization is the main technique used to mask personal identifiers from data records, and k-anonymization is usually adopted for generalizing quasi-identifiers.
Pseudonymization is performed by replacing real names with a temporary ID. It deletes or masks personal identifiers to make individuals unidentified. This method makes it possible to track the individual's record over time, even though the record will be updated. However, it can not prevent the individual from being identified if some specific combinations of attributes in the data record indirectly identify the individual.
Hub AI
De-identification AI simulator
(@De-identification_simulator)
De-identification
De-identification is the process used to prevent someone's personal identity from being revealed. For example, data produced during human subject research might be de-identified to preserve the privacy of research participants. Biological data may be de-identified in order to comply with HIPAA regulations that define and stipulate patient privacy laws.
When applied to metadata or general data about identification, the process is also known as data anonymization. Common strategies include deleting or masking personal identifiers, such as personal name, and suppressing or generalizing quasi-identifiers, such as date of birth. The reverse process of using de-identified data to identify individuals is known as data re-identification. Successful re-identifications cast doubt on de-identification's effectiveness. A systematic review of fourteen distinct re-identification attacks found "a high re-identification rate […] dominated by small-scale studies on data that was not de-identified according to existing standards".
De-identification is adopted as one of the main approaches toward data privacy protection. It is commonly used in fields of communications, multimedia, biometrics, big data, cloud computing, data mining, internet, social networks, and audio–video surveillance.
When surveys are conducted, such as a census, they collect information about a specific group of people. To encourage participation and to protect the privacy of survey respondents, the researchers attempt to design the survey in a way that when people participate in a survey, it will not be possible to match any participant's individual response(s) with any data published.
When an online shopping website wants to know its users' preferences and shopping habits, it decides to retrieve customers' data from its database and do analysis on them. The personal data information include personal identifiers which were collected directly when customers created their accounts. The website needs to pre-handle the data through de-identification techniques before analyzing data records to avoid violating their customers' privacy.
Anonymization refers to irreversibly severing a data set from the identity of the data contributor in a study to prevent any future re-identification, even by the study organizers under any condition. De-identification may also include preserving identifying information which can only be re-linked by a trusted party in certain situations. There is a debate in the technology community on whether data that can be re-linked, even by a trusted party, should ever be considered de-identified.
Common strategies of de-identification are masking personal identifiers and generalizing quasi-identifiers. Pseudonymization is the main technique used to mask personal identifiers from data records, and k-anonymization is usually adopted for generalizing quasi-identifiers.
Pseudonymization is performed by replacing real names with a temporary ID. It deletes or masks personal identifiers to make individuals unidentified. This method makes it possible to track the individual's record over time, even though the record will be updated. However, it can not prevent the individual from being identified if some specific combinations of attributes in the data record indirectly identify the individual.