Web-scraped EMA guidelines and European Public Assessment Reports

SND-ID: 2023-254. Version: 1. DOI: https://doi.org/10.57804/wa37-j878


Creator/Principal investigator(s)

Gabriel Westman - Uppsala University, Department of Medical Sciences orcid

Research principal

Uppsala University rorId


This submission consists of the data and python code that supports the original research article "A full-document analysis of the semantic relation between European Public Assessment Reports and EMA guidelines using a BERT language model" (Bergman et al, PLOS ONE 2023).

The database contains metadata and fulltext from 669 EMA Scientific guidelines and 1024 EMA European Public Assessment Reports.

Data contains personal data


Type of personal data

Authorship of assessments and studies


Method and outcome

Unit of analysis


Open regulatory data on medicinal products

Study design

Observational study

Sampling procedure

Total universe/Complete enumeration
All EMA guidelines and European Public Assessment Reports (EPARs) from the time period specified.

Time period(s) investigated

2008-01-01 – 2022-12-31

Number of individuals/objects


Data format / data structure

Data collection
Geographic coverage

Geographic spread

Geographic location: European Union (EU)

Administrative information
Topic and keywords

Research area

Basic medicine (Standard för svensk indelning av forskningsämnen 2011)

Published: 2023-11-27