A parallel subcorpus collected from the European Constitution (DE-EN) (TMX) annotated by the OpenNLP Part-of-Speech Tagger (German) and the OpenNLP Part-of-Speech Tagger (English)

EUconst subcorpus DE-EN (TMX)

A parallel subcorpus collected from the European Constitution (DE-EN) (TMX) annotated by the OpenNLP Part-of-Speech Tagger (German) and the OpenNLP Part-of-Speech Tagger (English). A parallel corpus collected from the European Constitution.

21 languages, 210 bitexts
total number of files: 986
total number of tokens: 3.01M
total number of sentence fragments: 0.22M

Please cite the following article if you use any part of the corpus in your own work:
Jörg Tiedemann, 2009, News from OPUS - A Collection of Multilingual Parallel Corpora with Tools and Interfaces. In N. Nicolov and K. Bontcheva and G. Angelova and R. Mitkov (eds.) Recent Advances in Natural Language Processing (vol V), pages 237-248, John Benjamins, Amsterdam/Philadelphia

ATTENTION
Please check the important legal notice at http://europa.eu/geninfo/legal_notices_en.htm

You don’t have the permission to edit this resource.
  • OpenNLP Part-of-Speech Tagger (German)
  • OpenNLP Part-of-Speech Tagger (English)