QTLP English-Portuguese Corpus for the MEDICAL domain


This data set was acquired in the framework of QTLP (http://www.qt21.eu/launchpad/), an EU-FP7 Funded Project under Grant Agreement 296347.

The dataset contains automatically detected pairs of parallel documents that were acquired from the web (i.e. from multilingual sites which contain content in the targeted languages and domain).

Almost all of the crawled sites were websites that contain scientific medical articles or abstracts of articles published in scientific journals.

In addition, this dataset includes automatically aligned sentences that were extracted from pairs of parallel documents.

The pairs of parallel documents have been classified (based on specific patterns which were detected in the URL or the title of the documents) into one of the following genre categories: "Reference", "News/Journalism", "Discussion", "Commercial" and "Information".

If you want your webpage/website to be removed from these corpora, please contact us.

You don’t have the permission to edit this resource.