
GRETIL
The Göttingen Register of Electronic Texts in Indian Languages (GRETIL) is a resource platform providing standardized machine-readable texts in Indian languages that have been contributed by various individuals and institutions. [more...]
-
text/tg.collection+tg.aggregation+xml
-
text/tg.collection+tg.aggregation+xml
-
text/tg.collection+tg.aggregation+xml
-
text/tg.collection+tg.aggregation+xml
GRETIL - Göttingen Register of Electronic Texts in Indian Languages
The Göttingen Register of Electronic Texts in Indian Languages (GRETIL) provides standardized machine-readable texts in Indian languages that have been contributed by various individuals and institutions. The corpus has a strong focus on Sanskrit texts, but also offers a number of additional South Asian languages, both modern and historical. GRETIL aims to cover a thematically wide range of diverse material and is not confined to certain literary genres or time periods.
Overview
The Sanskrit material is arranged in collections according to genres and thematic sub-corpora. The texts of remaining languages Pāli, Prākrit, Tibetan, and Old Javanese are in each case organised in a single collection. Material in Dravidian (Tamil, Malayāḷam, Maṇipravāḷam) languages and New Indo-Aryan languages (Hindī, Marāṭhī) is also subsumed in respective collections.
We are currently publishing the various collections. The following collections have already been published:
- Tibetan Collection (2 files)
- Prakrit Collection (8 files)
- Old Javanese Collection (9 files)
History
GRETIL was started by Reinhold Grünendahl in 2001 and was originally intended as a cumulative register of the numerous download sites for electronic texts in Indian languages but has shifted its focus to securing and documenting the efforts to encode these texts. This was achieved by by providing the contributions of varying sources and quality in an appropriately normalized way, with the minimum requirement being that full text search for each language is possible across the whole corpus without any additional conversion. Since its inception the corpus has considerably increased in size.
In XYZ Maximilian Mehner started to work on the process of converting the corpus to adhere to the standards proposed by the Text Encoding Initiative (TEI), in order to make the platform more versatile and transparent for scholars around the world while maintaining the simplicity of the original concept as a register. This also meant simplifying and streamlining the formats that were offered before. Since 2001 some features inevitably had become out-of-date. Thus the CSX- and REE-formats based on non-Unicode (CP437) character encoding were discontinued.
New Developments in Text+ Context
The SUB library has long been interested in involving GRETIL in long-term solutions supported by research data projects.In 2022 the Text+ consortium was launched as part of the German National Research Data Infrastructure (NFDI) initiative. The main objective of the consortia in this initiative is to ensure the long-term accessibility of research data, to integrate existing solutions and, in general, to improve the FAIR status of the resources. A Text+ user story by Buchholz suggested the integration of GRETIL into the portfolio of the consortium.
In 2024, after some preliminary tests, we started the integration of the GRETIL data. In general, all the steps taken during this phase tried to respect the choices made in the original project, to correct errors and improve the quality and FAIR status of the data, and to integrate the different types of data into long-term repositories. We tried to be in contact with the people involved previously in the project. Unfortunately, Reinhold Grünendahl passed away in 2024, leaving a personal and professional void that no one could fill. We would like to thank Maximilian Mehner for his invaluable help and support during this period.
Here is a list of the steps taken during this phase:
- The zip files available on the original GRETIL platform are published as they are in another repository.
- The GRETIL files already in TEI publish were validated against a TEI schema, corrected where necessary and the metadata enriched.
- The GRETIL files that had not been converted to TEI were converted from HTML to TEI, with some metadata fields (such as author, title, editor, etc.) being manually annotated and other metadata fields enriched.
- We defined new collections for the TextGrid repository structure, following the original GRETIL structure.
Metadata
Regarding the enrichment of the metadata, the following aspects have been improved:
- The concepts (languages, genres, sub-genres, etc.) used by the GRETIL to structure the files have been mapped to GND entities, integrated in the TEI files, but also used as in the TextGrid metadata files for their use as facets in the portal. In addition, some concepts from Indology that were not previously present in the GND were suggested and integrated into the GND vocabulary.
- Annotation of the files with the Basic Classification, both in terms of language family (main class 18.) and other classes.
- Language codes according to ISO 639-3
- TextGrid genre
- ORCID IDs for people involved in the project
Secondary Resources of GRETIL (Dictionaries and Encyclopaediae)
Together with the University of Cologne, it was decided not to integrate the secondary resources (dictionaries and encyclopaedias), since the portal Cologne Digital Sanskrit Dictionaries is already integrated in CLARIN-DE and Text+, which specialise in this type of resource. To find the dictionaries and encyclopaedia which were until now present in GRETIL, please, access the Cologne Digital Sanskrit Dictionaries portal.
GRETIL E-Library
We plan to integrate other parts of the original GRETIL portal, such as the e-library, in the near future in other repositories.