Data table key ----------------------------- This data set contains of two files, compiled from product information for centrally approved medicinal products within the EU. 'SmPC_export_all_220822.csv' contains a corpus containing Summary of Product Characteristics 'PL_export_all_220822.csv' contains a corpus containing Package Leaflets Variable descriptions ----------------------------- GlobalIndex: Index of sentence within all documents of same type (SmPC or PL) DocIndex: Index of sentence within current document SecIndex: Index of sentence within current section Sentence: Sentence as read from the PDF (with added period) Section: Document section Type: Document type ProductName: Name of medicinal product ProcedureNumber: EMA procedure number text: Sentence after some text processing (i.e. removal of some non-alphanumeric characters and multiple spaces) cluster: Semantic cluster assignment (Bergman et al, PLOS ONE, 2022)