Objective and Focus of the DTA Basic Format

The structural annotation of all DTA texts is done according to the DTA ›basic format‹ (DTABf). The DTABf was developed in accordance with which is based on the P5-Guidelines of the Text Encoding Initiative (TEI). Since the TEI Guidelines are offering solutions for a huge amount of tagging requirements and are thus rather extensive and flexible, they are meant to be adjusted to the individual necessities of projects working with the TEI. For the DTA this was achieved by creation of the DTABf, a proper subset of the TEI/P5 tagset, which offers not only fixed sets of elements but also of corresponding attributes and (where applicable) values. The DTABf tagset is fully conformant with the TEI/P5-Guidelines, i.e. the TEI tagset was only reduced not extended in any way.

The DTABf is part of the DTA Guidelines, which also contain General Guidelines and the Transcription Guidelines. It is supposed to allow for unrestricted tagging regarding possible structural phenomena while at the same time avoiding ambiguities regarding the tagging of similar phenomena. This way we want to ensure coherence in text structuring within the whole DTA corpus. Regarding the wide temporal coverage of the DTA corpus as well as the diversity of text types and genres this named intend of the DTABf turns out to be a huge challenge due to the fact that the heterogeneity of texts is accompanied by a huge structural variability among the original text sources.

With the DTABf we are proposing a standardized format for the structural annotation of digitized historical texts. The advantage of such an approach is that diverse TEI texts become analyzable not only by similar methods but also in comparison with one another. The underlying annotation guidelines of the DTABf are documented extensively, this way ensuring that the tagging remains comprehensive. Thus, DTABf conformity not only facilitaes the integration of TEI texts into the DTA infrastructure but also their re-use inside other full text archives.

The DTA basic format was recommended by the DFG and CLARIN-D for subsequent use, namely in the following documents:

Handreichung: Empfehlungen zu datentechnischen Standards und Tools bei der Erhebung von Sprachkorpora. Hrsg. vom Fachkollegium Sprachwissenschaften der Deutschen Forschungsgemeinschaft (DFG). Bonn 2015.
Förderkriterien für wissenschaftliche Editionen in der Literaturwissenschaft. Hrsg. vom Fachkollegium Literaturwissenschaft der Deutschen Forschungsgemeinschaft (DFG). Bonn 2015.
CLARIN-D User Guide. Part II (Linguistic resources and tools), ch. 6 (Types of resources), section "Text Corpora". Hrsg. von CLARIN-D AP 5. Berlin 2012.