Personal data

Personal data refers to any information relating to an identified or identifiable natural person; meaning information that can identify a living individual.

Direct personal data, or direct identifiers, is information that clearly identifies a person – for example, name, photograph, video, or personal identity number.

Indirect personal data, or indirect identifiers, is information that, by itself, may not be sufficient to identify someone, but which, when combined with other information, can lead to identification. Examples include residential address, membership in a specific association, IP address, or a vehicle registration number.

It is important to note that indirect personal data are not always easy to recognize. For instance, in a small town with only one person in a particular profession, information about their residence and occupation may be enough to identify them as a participant in a research project. This method of identifying individuals by combining information is called re-identification (in Swedish, bakvägsidentifiering).

In order to process personal data in compliance with the General Data Protection Regulation (GDPR), the processing much have a valid legal basis under Article 6 of the GDPR. For processing personal data in research, the most common legal basis is public interest.

Sensitive personal data

A special category of personal data is referred to as sensitive personal data. This includes information about

racial or ethnic origin
political opinions
religious or philosophical beliefs
trade union membership
data concerning health or data concerning an individual’s sex life or sexual orientation
genetic data
biometric data that uniquely identify an individual
data relating to criminal convictions and offences.

Sensitive personal data are subject to additional safeguards under the GDPR. The default position is that processing of personal data is prohibited. For processing to be lawful, there must both be a legal basis under Article 6, and the processing must fall under one of the exceptions listed in Article 9 of the GDPR.

In addition, the research project must have an approved ethical review before processing may begin, and appropriate protective measures must be in place. One such measure is pseudonymizing the personal data.

Pseudonymization and anonymization

Pseudonymization means that directly identifying information – such as names and personal identity numbers – is replaced with codes. This also requires the creation of a code key that links the pseudonyms to the original personal data. The code key must be stored securely, kept separate from the research data, and accessible only to authorized individuals.

As long as the code key exists – regardless of where or with whom – it remains possible to link the data to specific individuals. This means that while it is allowed to process pseudonymized data, they are still considered personal data and are subject to the GDPR.

If the code key and all other information that could directly or indirectly identify a person are deleted so that it is no longer possible to link the data to an individual, the material has been de-identified or anonymized. Once anonymized, the data are no longer classified as personal data and the GDPR no longer applies.

Under the Swedish Archive’s Act, researchers in Sweden are not generally allowed to delete the code key or other identifiers, since they are considered official documents (allmänna handlingar) and must be archived. However, register extracts or datasets provided by survey companies can be anonymized.

Re-identification

When preparing data for sharing, it is important to verify that they do not contain any information that could reveal the identity of study participants. Re-identification occurs when someone combines various indirect identifiers in the data, such as occupation, municipality, and age, and is thereby able to identify a specific person. These identifiers can be combined within the dataset itself, or data from the original dataset can be combined with external sources, such as public registers or information from social media.

You can reduce the risk of re-identification by re-coding them, for example into broader intervals using age ranges instead of exact numbers, or municipality instead of GPS coordinates. Which identifiers need to be re-coded varies from project to project, depending on the type of data collected and the associated risks.

Storing personal data

As with other research data, data containing personal data must be stored securely according to its information classification level, based on the sensitivity and importance of the information. Under the GDPR, the data controller is required to implement appropriate technical and organizational security measures.

Technical measures may include encryption, antivirus software, and firewalls to prevent unauthorized access. Organizational security measures may include access controls, regular reviews of access rights (e.g., when staff leave), and mandatory training in information security. It is also important to ensure that staff follow established procedures for handling personal data.