SND's subject-specific profiles and metadata standards

SND aims to ensure that data shared via our services are easy to find and described in a way that aligns as much as possible with the FAIR principles.

For data to be findable, they must be described in a standardized way. This is achieved using metadata – data about data – and through the use of metadata standards and controlled vocabularies. When researchers use established standards to describe data, descriptions become readable and interpretable by both humans and machines, which is key to meeting the FAIR principles. 

SND's metadata profiles

To make it easier for researchers from different disciplines to describe and share data, SND has developed subject-specific metadata profiles. The profiles build on a set of all metadata elements supported by DORIS, SND Master. Relevant elements are selected from the master profile and presented in subject-specific profiles. These profiles align with the top levels of the Fields of Research and Development classification (FORD) from the OECD Frascati Manual and Sweden’s national research subject classification system from Statistics Sweden (SCB)

SND’s metadata profiles build on domain-specific metadata standards and profiles from international infrastructures. For example, the Social Science profile meets the metadata requirements from CESSDA, the Language Resources profile is interoperable with the metadata schema used by CLARIN, and the Earth and Related Environmental Data profile fulfils requirements from both ISO 19115 and INSPIRE

Available metadata profiles at SND:

SND also provides a general profile for data that do not fit into any of the above categories. A profile for Humanities and the Arts is currently in development. 

Documentation for the metadata profiles is available on Zenodo.

 

Metadata standards 

Metadata standards are sets of rules for how to structure and relate metadata elements. They are aimed at users within a shared domain of interest, such as a specific research field. Metadata standards enhance both human and machine readability and support integration into various systems. Machine readability also allows metadata to be integrated into various systems, such as catalogues, search engines or systems that automatically transfer information. 

Researchers usually learn standardized ways of handling data during their training but rarely engage directly with metadata standards. Due to the varied needs of different disciplines, there are numerous metadata standards and implementations of them. 

Controlled vocabularies

Controlled vocabularies, or CVs, are standardized and organized lists of words and phrases used to harmonize the input of metadata across metadata standards. They are used to limit what can be entered into a specific field in a data description, for example by sets of keywords or key phrases with a fixed spelling. Controlled vocabularies can be open (expandable) or closed (fixed) and are managed by specific organizations or other controlling bodies.

Two exemples of controlled vocabularies are:  

  • MeSH (Medical Subject Headings). Produced by the National Library of Medicine (NLM) and used in, for instance, the life sciences to index medicine-related references in the PubMed database. There is also Swedish version of MeSH, provided by Karolinska Institutet: Svensk MeSH 
  • LCSH (Library of Congress Subject Headings). Created by Library of Congress and used by libraries to index objects with, for example, subject and genre/form headings. 

SND supports a wide range of controlled vocabularies, including: 

  • AAT, Art & Architecture Thesaurus 
  • AGROVOC, Vocabulary for Agricultural Sciences 
  • ALLFO, Allmän finländsk ontologi 
  • ELSST, The European Language Social Science Thesaurus 
  • EnvThes, Environmental Thesaurus 
  • FISH, Thesaurus of Monument Types 
  • GCMD, Global Change Master Directory, vocabulary for Earth Science 
  • GEMET, GEneral Multilingual Environmental Thesaurus 
  • ICD-10, International Classification of Diseases 
  • MeSH, Medical Subject Headings 
  • NASA STI Thesaurus. 

CVs for specific metadata elements are presented in the documentation for SND’s metadata profiles. SND uses vocabularies from standards like DDI, Dublin Core, CESSDA, and DataCite, and supplements with others such as GeoNames for geographical information and ISO-639 for language codes. 

In some cases where there are no machine-readable controlled vocabularies, SND has developed lists that are based on other established lists of key words. For example, we use vocabularies from the Swedish National Heritage Board (Riksantikvarieämbetet) for types of remains and investigations, and terms developed in the ARIADNE collaboration, published in PeriodO

Examples of metadata standards

For more information about metadata standards, see for example List of Metadata Standards from DCC.

DDI (Data Documentation Initiative) 

Developed by DDI Alliance and implemented by an XML schema with a set of elements for many types of data. DDI was designed for survey and observational data but has been expanded to cover more types of data, including data from the social sciences, economics, and health studie.  

“DDI is a free standard that can document and manage different stages in the research data lifecycle, such as conceptualization, collection, processing, distribution, discovery, and archiving.”
(DDI Alliance) 

There are two variants of DDI: 

  • DDI Codebook (version 1.x to 2.x): The ‘light’ version, mainly intended for documenting simple survey data.  
  • DDI Lifecycle (version 3.x): The full standard, designed for documenting data from the entire data life cycle, from conceptualization to publication and beyond. 

SND’s metadata structure builds on the metadata standard DDI Lifecycle 3.3 and DataCite’s metadata recommendations. We strive to use international and well-established controlled vocabularies and keyword lists that are used by other research infrastructures. 

Dublin Core  

Dublin Core is a metadata standard with definitions of metadata elements for describing information resources (the standard derives its name from a workshop in Dublin, Ohio, and has nothing to do with the Irish Dublin). The historical set of 15 core elements, the Dublin Core Metadata Element Set (DCMES) is: 

The standard has been updated with more elements. Read more about the updated list of metadata terms from the Dublin Core Metadata Initiative.

META-SHARE 

META-SHARE is a metadata standard that builds on Dublin Core but has been tailored to describe language resources, data, and technologies used in language research. The standard branches into nine directions, including lexical/conceptual resource, language description (e.g., lexicon or grammars), and various corpora (e.g., written/text, oral/spoken, video). 

SND’s metadata profile for language resources is based on META-SHARE.

Other common metadata standards 

  • MARC (Machine-Readable Cataloging) are standards for representation and communication of bibliographic and related information from libraries.  
  • METS (Metadata Encoding & Transmission Standard) is a standard for digital publications used for cataloguing and metadata transfer. METS mainly encodes administrative metadata. 
  • PREMIS (Preservation Metadata Implementation Strategies) is used for preservation metadata in digital archiving systems and includes administrative and technical metadata for digital objects. 
  • OLAC (Open Language Archives Community) is another standard that builds on Dublin Core, with additional controlled vocabularies suitable for linguistic data, for example language codes. 
  • TEI (Text Encoding Initiative) is used for detailed encoding of literary and linguistic texts, for example with grammar, stylistics, vocabulary, and manuscripts. It also facilitates encoding locations, people, dates, and objects mentioned in the texts, and links to other sites where more information can be found.