Dataset with four years of condition monitoring technical language annotations from paper machine industries in northern Sweden

SND-ID: 2023-257. Version: 1. DOI: https://doi.org/10.5878/hafd-ms27

Associated documentation

Citation

Creator/Principal investigator(s)

Karl Löwenmark - Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering orcid

Fredrik Sandin - Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering orcid

Marcus Liwicki - Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering orcid

Stephan Schnabel - SKF (Sweden) orcid

Research principal

Luleå University of Technology - Department of Computer Science, Electrical and Space Engineering rorId

Principal's reference number

2019-02533

Description

This dataset consists of four years of technical language annotations from two paper machines in northern Sweden, structured as a Pandas dataframe. The same data is also available as a semicolon-separated .csv file. The data consists of two columns, where the first column corresponds to annotation note contents, and the second column corresponds to annotation titles. The annotations are in Swedish, and processed so that all mentions of personal information are replaced with the string ‘egennamn’, meaning “personal name” in Swedish. Each row corresponds to one annotation with the corresponding title.

Data can be accessed in Python with:
import pandas as pd
annotations_df = pd.read_pickle("Technical_Language_Annotations.pkl")
annotation_contents = annotations_df['noteComment']
annotation_titles = annotations_df['title']

Data contains personal data

Yes

Type of personal data

Signed annotations are preserved in the raw data. As a result, the dataset contains pseudonymised personal data.

Language

Method and outcome

Time period(s) investigated

2018 – 2022

Data format / data structure

Data collection
Language resources

Resource type

Corpus

Foreseen use

NLP application, Human use

Text corpus

  • Linguality

    Monolingual
  • Language

    • Swedish (swe)

      Tekniskt språk (jargon)

  • Modality

    Written Language
  • Size

    Entries: 2385

    Expressions: 1613

  • Annotation

    • Entity Mentions

      Automatic annotation

      Annotated elements: Other

  • Original source

    https://doi.org/10.5878/z34p-qj52
Geographic coverage

Geographic spread

Geographic location: Sweden

Geographic description: Northern Sweden

Administrative information

Responsible department/unit

Department of Computer Science, Electrical and Space Engineering

Contributor(s)

Håkan Sirkka - Smurfit Kappa

Per-Erik Larsson - SKF (Sweden)

Pär-Erik Martinsson - Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering

Peter Wikström - SCA Munksund

Kjell Lundberg - Smurfit Kappa

... Show more..

Håkan Sirkka - Smurfit Kappa

Per-Erik Larsson - SKF (Sweden)

Pär-Erik Martinsson - Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering

Peter Wikström - SCA Munksund

Kjell Lundberg - Smurfit Kappa

Smurfit Kappa

SKF (Sweden) rorId

Svenska Cellulosa (Sweden) rorId

RISE Research Institutes of Sweden rorId

Show less..

Funding

  • Funding agency: VINNOVA rorId
  • Funding agency's reference number: 2019-02533
  • Project name on the application: Kunskapsintegrering för klassificering av maskinskador
  • Funding information: https://www.vinnova.se/p/kunskapsintegrering-for-klassificering-av-maskinskador/
Publications

Löwenmark, K., Taal, C., Nivre, J., Liwicki, M., & Sandin, F. (2022). Processing of Condition Monitoring Annotations with BERT and Technical Language Substitution: A Case Study. In Proceedings of the 7th European Conference of the Prognostics and Health Management Society 2022 (pp. 306–314).
DOI: https://doi.org/10.36001/phme.2022.v7i1.3356
URN: urn:nbn:se:ltu:diva-95407
SwePub: oai:DiVA.org:ltu-95407

If you have published anything based on these data, please notify us with a reference to your publication(s). If you are responsible for the catalogue entry, you can update the metadata/data description in DORIS.

License

CC BY 4.0

Versions

Version 1. 2023-12-21

Version 1: 2023-12-21

DOI: https://doi.org/10.5878/hafd-ms27

CLARIN Virtual Collection Registry

Add to collection

A virtual collection is connected to a specific research purpose and contains links to data resources from various digital archives. It is easy to create, access, and cite the collection.

Read more about virtual collections on the CLARIN website.

Published: 2023-12-21
Last updated: 2023-12-21