Glossary

A glossary of terms used in this handbook. This list is not exhaustive and may be expanded.

Personal data

Personal data means information that can directly or indirectly identify a natural person, meaning a living individual.

Directly identifying information includes names, personal identity numbers, e-mail addresses, and IP addresses. Indirectly identifying information, sometimes called quasi-identifiers, is information that on its own is not sufficient to identify a living individual but that may do so in combination with other information (e.g., age, income, and municipality). Identifying individuals by combining information this way is known as re-identification.

Note that personal information about deceased individuals generally does not count as personal data from a GDPR perspective.

Sensitive personal data

Sensitive personal data includes information relating to a person's:

Racial or ethnic origin
Political opinions
Religious or philosophical beliefs
Trade union membership
Health
Sex life or sexual orientation
Genetic data
Biometric data for the purpose of uniquely identifying an individual.

Collecting or conducting research using sensitive personal data requires ethical approval.

Personal data processing

Processing refers to any handling of and operations on personal data, including:

collection
storage
publication
disclosure
archiving
erasure or destruction

Examples include collecting personal data through interviews, surveys, or administrative registers.

Pseudonymized vs. anonymized data

Pseudonymized research data have been processed so that individuals can no longer be identified without access to additional information, such as a code key. Pseudonymization is an important and effective safeguard for protecting participant privacy in research. However, pseudonymized data qualify as personal data under the GDPR as it is still possible to identify an individual using additional data sources.

Anonymized research data, on the other hand, are fully de-identified and cannot be linked to an individual – not even via additional data sources – and therefore no longer count as personal data.

Code key

A code key is used to separate direct identifiers from research data by replacing them with serial numbers or similar placeholders. The mapping, or code key, between identifiers and serial numbers is stored separately. This way, data can be analyzed and interpreted without revealing identities, and the privacy of the research subjects is protected. However, as long as a code key exists, the data are considered pseudonymized, not anonymized, as the code key is an example of an additional data source that could be used to connect data to individuals. That means that the data are still considered personal data and fall under data protection regulations.

Re-identification

Re-identification refers to the process of combining indirect identifiers (e.g., age, income, and municipality) to identify individuals. This can be done using variables or indirect identifiers within the original dataset or by combining information from the original dataset with additional data sources such as registers or information on social media. The risk of re-identification can be reduced by recoding variables – for example, by grouping ages or incomes into larger intervals or brackets, or by using larger geographic units, such as county instead of municipality.

Official document

A document is considered official (allmän handling) if it is held by a public authority and has either been received or drawn up by that authority. Research data held by public institutions in Sweden are generally classified as official documents.

Read more about research principals.

Data controller

A data controller (personuppgiftsansvarig) is the natural or legal person, public authority, agency or other body which, alone or jointly with others, determines the purposes and means of the personal data processing. The data controller is responsible for how research data containing personal information are processed. In Swedish publicly funded research, the data controller is typically the research principal – for example, the university.

Data processor

A data processor (personuppgiftsbiträde) processes personal data on behalf of the data controller. The data processor can be a natural or legal person, public authority, agency or other body, or a research infrastructure or company that helps collect or analyze research data on the controller’s behalf. The data processor is always external to the data controller’s organization and operates under a specific mandate to process personal data for the data controller’s organization. This mandate is always regulated by a Data Processing Agreement (personuppgiftsbiträdesavtal).

When two universities collaborate in research, each university is typically the data controller for the data it manages.

Data Protection Officer

Public authorities and universities acting as data controllers are required to appoint a Data Protection Officer, DPO (dataskyddsombud). The DPO’s role is to inform and advise on the General Data Protection Regulation (GDPR), provide guidance on personal data processing, and monitor the organization’s compliance with the GDPR. The DPO must be involved in all data protection impact assessments and acts as the contact point for the Swedish Authority for Privacy Protection (IMY).

Statistical disclosure control

Statistical disclosure control refers to a set of methods used to ensure that individual-level data cannot be inferred from a dataset. It also includes techniques for assessing risks of re-identification and information loss when anonymizing or pseudonymizing data.

Read more: Handbok i statistisk röjandekontroll (PDF) (Handbook on statistical disclosure control; in Swedish).

Encryption

Encryption is the process of converting information into a code or cipher that cannot be read without a decryption key. Encryption algorithms range from simple ones (e.g., shifting letters three steps ahead in the alphabet) to highly secure algorithms that cannot be broken even with supercomputers – unless the key is known. For research data containing personal information, encryption is considered a form of pseudonymization – the encrypted data are pseudonymized and the decryption key is additional information that can be used to identify individuals in the data.

In research, encryption is often used for data at rest, meaning data that are not in active use.