Dataset with condition monitoring vibration data annotated with technical language, from paper machine industries in northern Sweden

SND-ID: 2023-246. Version: 2. DOI: https://doi.org/10.5878/hxc0-bd07

Citation

Alternative title

Annotated condition monitoring data for technical language processing and supervision

Creator/Principal investigator(s)

Karl Löwenmark - Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering orcid

Fredrik Sandin - Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering orcid

Marcus Liwicki - Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering orcid

Stephan Schnabel - SKF (Sweden) orcid

Research principal

Luleå University of Technology - Department of Computer Science, Electrical and Space Engineering rorId

Principal's reference number

2019-02533

Description

Labelled industry datasets are one of the most valuable assets in prognostics and health management (PHM) research. However, creating labelled industry datasets is both difficult and expensive, making publicly available industry datasets rare at best, in particular labelled datasets.
Recent studies have showcased that industry annotations can be used to train artificial intelligence models directly on industry data ( https://doi.org/10.36001/ijphm.2022.v13i2.3137 , https://doi.org/10.36001/phmconf.2023.v15i1.3507 ), but while many industry datasets also contain text descriptions or logbooks in the form of annotations and maintenance work orders, few, if any, are publicly available.
Therefore, we release a dataset consisting with annotated signal data from two large (80mx10mx10m) paper machines, from a Kraftliner production company in northern Sweden. The data consists of 21 090 pairs of signals and annotations from one year of production. The annotations are written in Swedish, by on-site Swedish experts, and the signals consist primarily of accelerometer vibration measurements from the two ma

... Show more..
Labelled industry datasets are one of the most valuable assets in prognostics and health management (PHM) research. However, creating labelled industry datasets is both difficult and expensive, making publicly available industry datasets rare at best, in particular labelled datasets.
Recent studies have showcased that industry annotations can be used to train artificial intelligence models directly on industry data ( https://doi.org/10.36001/ijphm.2022.v13i2.3137 , https://doi.org/10.36001/phmconf.2023.v15i1.3507 ), but while many industry datasets also contain text descriptions or logbooks in the form of annotations and maintenance work orders, few, if any, are publicly available.
Therefore, we release a dataset consisting with annotated signal data from two large (80mx10mx10m) paper machines, from a Kraftliner production company in northern Sweden. The data consists of 21 090 pairs of signals and annotations from one year of production. The annotations are written in Swedish, by on-site Swedish experts, and the signals consist primarily of accelerometer vibration measurements from the two machines.
The dataset is structured as a Pandas dataframe and serialized as a pickle (.pkl) file and a JSON (.json) file. The first column (‘id’) is the ID of the samples; the second column (‘Spectra’) are the fast Fourier transform and envelope-transformed vibration signals; the third column (‘Notes’) are the associated annotations, mapped so that each annotation is associated with all signals from ten days before the annotation date, up to the annotation date; and finally the fourth column (‘Embeddings’) are pre-computed embeddings using Swedish SentenceBERT. Each row corresponds to a vibration measurement sample, though there is no distinction in this data between which sensor or machine part each measurement is from. Show less..

Data contains personal data

Yes

Type of personal data

Signed annotations are preserved in the raw data. As a result, the dataset contains pseudonymised personal data.

Language

Method and outcome

Data format / data structure

Data collection
  • Mode of collection: Recording
  • Description of the mode of collection: Vibration data collected through accelerometers (SKF IMx-system with CMSS sensors)
  • Data collector: Luleå University of Technology
  • Instrument: SKF CMSS 2207 - https://www.skf.com/ph/productinfo/productid-CMSS%202207
  • Instrument: SKF CMSS 2200 - https://www.skf.com/ph/productinfo/productid-CMSS%202200
  • Source of the data: Physical objects
Geographic coverage

Geographic spread

Geographic location: Sweden

Administrative information

Responsible department/unit

Department of Computer Science, Electrical and Space Engineering

Other research principals

Contributor(s)

Peter Wikström - SCA Munksund

Håkan Sirkka - Smurfit Kappa

Pär-Erik Martinsson - Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering

Per-Erik Larsson - SKF (Sweden)

Kjell Lundberg - Smurfit Kappa

... Show more..

Peter Wikström - SCA Munksund

Håkan Sirkka - Smurfit Kappa

Pär-Erik Martinsson - Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering

Per-Erik Larsson - SKF (Sweden)

Kjell Lundberg - Smurfit Kappa

SCA Munksund rorId

SKF (Sweden) rorId

RISE Research Institutes of Sweden rorId

Smurfit Kappa

Show less..

Funding 1

  • Funding agency: VINNOVA rorId
  • Funding agency's reference number: 2019-02533_Vinnova
  • Project name on the application: Kunskapsintegrering för klassificering av maskinskador
  • Funding information: Knowledge integration for fault severity estimation

Funding 2

  • Funding agency: Luleå University of Technology rorId
Topic and keywords

Research area

Probability theory and statistics (Standard för svensk indelning av forskningsämnen 2011)

Computer and information science (Standard för svensk indelning av forskningsämnen 2011)

Language technology (computational linguistics) (Standard för svensk indelning av forskningsämnen 2011)

Other computer and information science (Standard för svensk indelning av forskningsämnen 2011)

Signal processing (Standard för svensk indelning av forskningsämnen 2011)

Other mechanical engineering (Standard för svensk indelning av forskningsämnen 2011)

Paper, pulp and fiber technology (Standard för svensk indelning av forskningsämnen 2011)

Publications

Sort by name | Sort by year

Löwenmark, K., Taal, C., Vurgaft, A., Liwicki, M., Nivre, J., & Sandin, F. (2023). Labelling of annotated condition monitoring data through technical language processing.
URN: urn:nbn:se:ltu:diva-95406
SwePub: oai:DiVA.org:ltu-95406

Löwenmark, K., Taal, C., Schnabel, S., Liwicki, M., & Sandin, F. (2022). Technical Language Supervision for Intelligent Fault Diagnosis in Process Industry. In International Journal of Prognostics and Health Management (Vol. 13, Issue 2).
DOI: https://doi.org/10.36001/ijphm.2022.v13i2.3137
URN: urn:nbn:se:ltu:diva-93815
SwePub: oai:DiVA.org:ltu-93815

Löwenmark, K. (2023). Technical Language Supervision for Intelligent Fault Diagnosis [Licentiate thesis]. Luleå University of Technology.
ISBN: 9789180482547
URN: urn:nbn:se:ltu:diva-95414
SwePub: oai:DiVA.org:ltu-95414

Löwenmark, K., Taal, C., Nivre, J., Liwicki, M., & Sandin, F. (2022). Processing of Condition Monitoring Annotations with BERT and Technical Language Substitution: A Case Study. In Proceedings of the 7th European Conference of the Prognostics and Health Management Society 2022 (pp. 306–314).
DOI: https://doi.org/10.36001/phme.2022.v7i1.3356
URN: urn:nbn:se:ltu:diva-95407
SwePub: oai:DiVA.org:ltu-95407

If you have published anything based on these data, please notify us with a reference to your publication(s). If you are responsible for the catalogue entry, you can update the metadata/data description in DORIS.

Versions

Version 2. 2023-12-21

Version 2: 2023-12-21

DOI: https://doi.org/10.5878/hxc0-bd07

Metadata corrected: Updated level of accessibility to restricted access

Version 1. 2023-11-29

Version 1: 2023-11-29

DOI: https://doi.org/10.5878/z34p-qj52

Contact for questions about the data

Karl Löwenmark

karl.lowenmark@ltu.se

This resource has the following relations

CLARIN Virtual Collection Registry

Add to collection

A virtual collection is connected to a specific research purpose and contains links to data resources from various digital archives. It is easy to create, access, and cite the collection.

Read more about virtual collections on the CLARIN website.

Published: 2023-12-21
Last updated: 2023-12-22