SL Data Sets

Further Resources

On this page you will find outlines listing sign language corpora, further resources and additional information.

Can't find your dataset represented or have more recent information? Feel free to get in touch with the respective contact!

The Sign Language Dataset Compendium

The Sign Language Dataset Compendium, provides an overview of digital resources for signed languages suitable for research. The compendium also covers both corpora and lexical resources. It supplies an overview of commonly used data collection tasks and in which corpora they were used. For those looking for datasets for a specific language, a language index is provided.

Overview 2021

Our sister project EASIER has compiled an overview of existing data collections for European sign languages. The document describes 26 corpora and 41 lexical resources, as well as 26 commonly used data collection tasks. The DGS corpus and its type list are also represented.

Each entry includes a brief description of the resource and a table of key facts, such as languages represented, size of collection, demographics of participants, type of linguistic information, and information on access and licensing terms.

The document can be found here: https://doi.org/10.25592/uhhfdm.9561

EASIER is a Horizon 2020-project that aims to design, develop, and validate a complete multilingual machine translation system that will be used as a framework for accessible communication between deaf/Deaf and hearing people.

Contact

E-Mail: easier@dgs-korpus.de

Overview 2012

In the context of Dr. Reiner Konrad's dissertation, an overview has been created with the aim of providing comprehensive and up-to-date information about ongoing or already completed sign language corpus projects. The survey is presented in a poster-sized chart. Besides general information on research objectives, project manager, and contact (e-mail), specifications on corpus data are listed in three main columns:

raw data: amount of footage, format (digitized or not), accessibility
metadata: kind of data (e.g. spontaneous or elicited, monologues or dialogues, text genres), number and age of informants, sign language proficiency
primary data: further subdivided in transcribed, lemmatized, and annotated data; time-alignment and software (annotation tools)

Further information on the listed corpus projects like publications, online resources and website are given under the heading “Resources and references”. References are short citations, the full reference can be found in the pdf-file.

We would like to invite you to contribute to this survey.

If your corpus project is already listed, please inform us if the information given is inaccurate or not up-to-date.
If your project is missing, please fill out the questionnaire (see below) and send it to us.

Sign Language Corpora Survey (english)

Sign Language Corpora Survey (german)

References

Questionnaire

This survey originates from the doctoral thesis of Reiner Konrad: Die lexikalische Struktur der DGS im Spiegel empirischer Fachgebärdenlexikographie. Zur Integration der Ikonizität in ein korpusbasiertes Lexikonmodell. [The Lexical Structure of German Sign Language (DGS) in the Light of Empirical LSP Lexicography. On how to Integrate Iconicity in a Corpus-Based Lexicon Model]. University of Hamburg; published without this survey in 2011, Tübingen: Narr Verlag.

Contact

Dr. Reiner Konrad

Institute for German Sign Language and Communication of the Deaf

Gorch-Fock-Wall 7

20354 Hamburg

Germany

e-mail: reiner.konrad@sign-lang.uni-hamburg.de