A collection of documents from
IMPORTANT: If you use the OpenSubtitle corpus, please, add a link to to your website and to your reports and publications produced with the data! I got the data under this condition!

30 languages, 361 bitexts
total number of files: 20,400
total number of tokens: 149.44M
total number of sentence fragments: 22.27M

Please cite the following article if you use any part of the corpus in your own work:
Jörg Tiedemann, 2009, News from OPUS - A Collection of Multilingual Parallel Corpora with Tools and Interfaces. In N. Nicolov and K. Bontcheva and G. Angelova and R. Mitkov (eds.) Recent Advances in Natural Language Processing (vol V), pages 237-248, John Benjamins, Amsterdam/Philadelphia

Please check the disclaimer and legal information at

You don’t have the permission to edit this resource.

    People who looked at this resource also viewed the following:
    People who downloaded this resource also downloaded the following: