The Arabic E-Book Corpus

SND-ID: 2024-145. Version: 1. DOI:


Alternative title

مدونة لغوية للكتب العربية الإلكترونية

Creator/Principal investigator(s)

Andreas Hallberg - University of Gothenburg, Department of Languages and Literatures orcid

Research principal

University of Gothenburg - Department of Languages and Literatures rorId


The Arabic E-Book Corpus is a freely available collection of 1,745 books (81.5 million words) published in by the Hindawi foundation between 2008 and 2024. The books are of various genres, including non-fiction, novels, children's literature, poetry, and plays. The corpus is provided in two versions: html and unformatted plain text. The latter version will be appropriate for most purposes.

Data contains personal data


Type of personal data

The data container names of copyright holders, such as authors and translators, as well as historical, political, and other public figures mentioned in the works.


Method and outcome

Time period(s) investigated

2008 – 2024

Data format / data structure

Data collection
Language resources

Resource type


Foreseen use

NLP application, Human use

Text corpus

Geographic coverage

Geographic spread

Geographic location: North Africa, The Middle East

Administrative information

Responsible department/unit

Department of Languages and Literatures

Topic and keywords

Research area

Language technology (computational linguistics) (Standard för svensk indelning av forskningsämnen 2011)

Specific languages (Standard för svensk indelning av forskningsämnen 2011)



CC BY 4.0


Version 1. 2024-12-11

Version 1: 2024-12-11


Contact for questions about the data

This resource has the following relations

Compiles Hindawi

CLARIN Virtual Collection Registry

Add to collection

A virtual collection is connected to a specific research purpose and contains links to data resources from various digital archives. It is easy to create, access, and cite the collection.

Read more about virtual collections on the CLARIN website.

Published: 2024-12-11
Last updated: 2024-05-20