WMT12 dataset - machine translations with human judgements and post-editions
2,254 English-Spanish source sentences and their machine translations, along their human post-edited version, original references, and 1-5 quality score. For the latter, the official version used in the WMT12 shared task on quality estimation takes a weighted average of 3 annotators, but all 3 individual annotations (and weights) are also available for both training and test sets.
People who looked at this resource also viewed the following:
- WPTP12 dataset - machine translations with post-editing performed by multiple translators with different levels of expertise
- TSD13 dataset - English-Spanish WMT12 machine translations by various MT systems, post-edited by 10 translation students
- EAMT11 dataset - machine translations with human judgements and post-editions
- OpenNLP Tokenizer (Portuguese)
People who downloaded this resource also downloaded the following: