Funded by DFG - German Research Foundation

Project Start: 1 April 2011

  1. phase: April 2011 - June 2013
  2. phase: July 2013 - January 2017

Referring to the GECCo corpus

If you use material from the GECCo corpus and want to quote it, you may want to use the following information:

General corpus description (in German):

Menzel, Katrin & Ekaterina Lapshinova-Koltunski (2014). “Kontrastive Analyse deutscher und englischer Kohäsionsmittel in verschiedenen Diskurstypen”, In: tekst i dyskurs - Text und Diskurs. Zeitschrift der Abteilung für germanistische Sprachwissenschaft des Germanistischen Instituts Warschau. 247-266.

Spoken corpus data:

Lapshinova-Koltunski, E., K. Kunz and M. Amoia (2012). Compiling a Multilingual Corpus. In Heliana Mello, Massimo Pettorino and Tommaso Raso (eds). Proceedings of the VIIth GSCP-2012 International Conference: Speech and Corpora. Firenze: Firenze University Press. pp. 29-34.

Annotation of cohesive devices:

Lapshinova-Koltunski, E. and K. Kunz (2014). Annotating Cohesion for Multillingual Analysis. In Proceedings of the 10th Joint ACL - ISO Workshop on Interoperable Semantic Annotation, Reykjavik, May 26, 2014

Menzel, Katrin (2017). “Understanding English-German Contrasts - A Corpus-based Comparative Analysis of Ellipses as Cohesive Devices” (Ellipsen als Textkohäsionsmittel – eine kontrastive Korpusstudie für das Sprachenpaar Englisch-Deutsch), Doctoral Dissertation 2016, Saarbrücken: Saarländische Universitäts- und Landesbibliothek.

Martinez Martinez, J. M., Lapshinova-Koltunski, E. and K. A. Kunz (2016). Annotation of Lexical Cohesion in English and German: Automatic and Manual Procedures. In: Proceedings of the Conference on Natural Language Processing (Konferenz zur Verarbeitung natürlicher Sprache) - KONVENS-2016, September, Bochum, Germany.

You may also refer to the respective publications and deliverables listed on this project website.

Project results

The GECCo-project produced a corpus for contrastive linguistic work in the area of textual cohesion. The corpus covers English and German texts in a range of registers and exists in various releases. Its written registers and their lexicogrammatical annotations were imported in a re-organized form from the earlier CroCo-project. The corpus and its documentation can be accessed online for queries with CQPweb. Unrestricted access to the corpus texts is not possible due to property-right restrictions. Querying the corpus, however, with and without GECCo’s annotations is possible and open to members of the research community and students.

The linguistic basis of corpus annotations lies in system-based comparisons of cohesive devices in English and German. The annotations allow empirical tests of relevant frequency distributions of cohesive configurations between the two languages, between 14 registers and between spoken and written language use. The GECCo project produced versions of the GECCo-corpus, though for lexical chains with only a subset of representative registers. The project developed and documented a mixture of automatic pre-coding with human intervention and post-editing to develop sub-corpora of sufficient quality. Collaboration with the Prague Discourse Tree Bank has led to interoperable annotations across theories and corpora.

As far as empirical studies go, statistically refined evaluations of empirical results were produced from both phases of GECCo’s lifetime. The project’s generalization of cohesive contrasts in terms of degree, strength, semantic type and variation of cohesive devices and chains with particular consideration of the written-spoken dimension is new in the area of contrastive linguistics. This generalization appears necessary as one type of tertium comparationis for cross-linguistic comparison. Together with GECCo’s documented annotation guidelines, its statistical evaluation techniques constitute models for a working pipeline in corpus-based empirical work. Empirical results includ wide-ranging overviews on contrasts in cohesion English-German, as well as focused accounts of lexical cohesion and of ellipsis with particular reference to their function across the spoken-written modes.

GECCo has explored and documented three areas of application of its results: its findings feed into language teaching methodologies to allow more discourse-oriented methodologies of teaching, increasing communicative competence. They furthermore provide substantial input for the modelling and teaching of translation, where again an orientation towards improved skills in the creation of target-culture-adapted text production seems highly desirable. And finally, project members jointly with collaborating co-authors produced several publications and prototypes integrating findings from GECCo into linguistic engineering and machine translation, showing how improved control of text cohesion improves linguistic engineering in various aspects.

Here you can download an outline of the Final Report of the project.

Previous projects