ECDC Translation Memory subcorpus DE-PT
ECDC-TM subcorpus DE-PT
In October 2012, the European Union (EU) agency 'European Centre for Disease Prevention and Control' (ECDC) released a translation memory (TM), i.e. a collection of sentences and their professionally produced translations, in twenty-five languages. The data gets distributed via the web pages of the EC's Joint Research Centre (JRC). Here we describe this resource, which bears the name ECDC Translation Memory, short ECDC-TM.
Translation Memories are parallel texts, i.e. texts and their manually produced translations. They are also referred to as bi-texts. A translation memory is a collection of small text segments and their translations (referred to as translation units, TU). These TUs can be sentences or parts of sentences. Translation memories are used to support translators by ensuring that pieces of text that have already been translated do not need to be translated again.
Both translation memories and parallel texts are important linguistic resources that can be used for a variety of purposes, including:
- training automatic systems for statistical machine translation (SMT);
- producing monolingual or multilingual lexical and semantic resources such as dictionaries and ontologies;
- training and testing multilingual information extraction software;
- checking translation consistency automatically;
- testing and benchmarking alignment software (for sentences, words, etc.).
The value of a parallel corpus grows with its size and with the number of languages for which translations exist. While parallel corpora for some languages are abundant, there are few or no parallel corpora for most language pairs. The most outstanding advantage of the various parallel corpora available via our web pages - apart from them being freely available - is the number of rare language pairs (e.g. Maltese-Estonian, Slovene-Finnish, etc.).
The ECDC-TM is relatively small compared to the JRC-Acquis and to DGT-TM, but it has the advantage that it focuses on a very different domain, namely that of public health. Also, it includes translation units for the languages Irish (Gaelige, GA), Norwegian (Norsk, NO) and Icelandic (IS).
By downloading or using the ECDC-Translation Memory, you are bound by the ECDC-TM usage conditions (http://optima.jrc.it/Resources/ECDC-TM/2012_10_Terms-of-Use_ECDC-TM.pdf).
People who looked at this resource also viewed the following: