QTLP English-Greek Corpus for the MEDICAL domain


This data set was acquired in the framework of QTLP (http://www.qt21.eu/launchpad/), an EU-FP7 Funded Project under Grant Agreement 296347.

The dataset contains automatically detected pairs of parallel documents that were acquired from the web (i.e. from multilingual sites which contain content in the targeted languages and domain).

The majority of the crawled sites were: i) websites that contain abstracts of scientific papers and ii) websites of organizations from the public or private sector that are related to medical/health services (e.g. medical centers, institutes, hospitals, etc.)

In addition, this dataset includes automatically aligned sentences that were extracted from pairs of parallel documents.

The pairs of parallel documents have been classified (based on specific patterns which were detected in the URL or the title of the documents) into one of the following genre categories: "Reference", "News/Journalism", "Discussion", "Commercial" and "Information".

If you want your webpage/website to be removed from these corpora, please contact us.

You don’t have the permission to edit this resource.