A collection of documents from http://www.opensubtitles.org/.
IMPORTANT: If you use the OpenSubtitle corpus:
Please, add a link to http://www.opensubtitles.org/ to your website and to your reports and publications produced with the data! I promised this when I got the data from the providers of that website!
54 languages, 1,025 bitexts
total number of files: 1,390,584
total number of tokens: 8.31G
total number of sentence fragments: 1.22G
Please cite the following article if you use any part of the corpus in your own work:
Jörg Tiedemann, 2009, News from OPUS - A Collection of Multilingual Parallel Corpora with Tools and Interfaces. In N. Nicolov and K. Bontcheva and G. Angelova and R. Mitkov (eds.) Recent Advances in Natural Language Processing (vol V), pages 237-248, John Benjamins, Amsterdam/Philadelphia
Please check the disclaimer and legal information at http://www.opensubtitles.org/en/disclaimer