Gold standard for English-Swedish Europarl data (GES)

SND-ID: ext0283-1.

Access to data via


Creator/Principal investigator(s)

Lars Ahrenberg - Linköping University, Department of Computer and Information Science

Maria Holmqvist - Linköping University, Department of Computer and Information Science

Research principal

Linköping University - Department of Computer and Information Science rorId


Reference corpus for word linking, divided into training data and test data. The sentences come from the English and Swedish parts of Europarl.

Data are created from the English-Swedish part of the Europarl corpus. For each sentence pair in the selected subset, token correspondences are stated as pairs of integral token identifiers
Method and outcome

Data format / data structure

Data collection
Language resources

Resource type


Foreseen use

NLP application

Text corpus

  • Linguality

  • Language

    • English (eng)

    • Swedish (swe)

      Sentences: 1164

  • Modality

    Written Language
  • Size

    Sentences: 1164

  • Annotation

    • Alignment

      Manual annotation

Geographic coverage
Administrative information

Responsible department/unit

Department of Computer and Information Science

Topic and keywords

Research area

Engineering and technology (Standard för svensk indelning av forskningsämnen 2011)

Language technology (computational linguistics) (Standard för svensk indelning av forskningsämnen 2011)


Maria Holmqvist and Lars Ahrenberg (2011). A Gold Standard for English-Swedish Word Alignment. In Proceedings of the 18th Nordic Conference on Computational Linguistics, Riga, Latvia, May 11-13, 2011.


CC BY 4.0

Contact for questions about the data

CLARIN Virtual Collection Registry

Add to collection

A virtual collection is connected to a specific research purpose and contains links to data resources from various digital archives. It is easy to create, access, and cite the collection.

Read more about virtual collections on the CLARIN website.