The DGT-Acquis is a family of several multingual parallel corpora extracted from the Official Journal of the European Union (OJ) in Formex 4 (XML) format, consisting of documents from the middle of 2004 to the end of 2011 in up to 23 languages.
The original data of the OJ has been processed in several steps. In each step, the result of the previous step was refined to a finer granularity: (1) original data, (2) file level in Formex4 format, (3) file level in plain text and (4) paragraph level. The result of each step is a corpus packaged as a self-contained Multilingual Dataset Format (muset) file. Even though the musets are independent, they are linked to each other so that, for example, one can find the source document of any given text segment. Data users can choose the data with the most appropriate processing level for their own needs. The table in next section (statistics) describes the data and provides some statistics.
The original data (da1-ox) includes both the XML and the tiff files. This opens the option to make use of the data for other types of applications (e.g. to work on optical character recognition, and more). The original data also allows users who want to re-process the whole data set using their own tools and methods.
The file level formats (da1-fx in Formex 4 format and da1-ft in plain text format) are relevant for users who need access to the full texts, e.g. to analyse the discourse structure, to consider the context of each sentence, etc.
The paragraph level format (da1-pc) is relevant for people who do not need access to the full text, but who are mostly interested in smaller segments and their translations, e.g. to produce dictionaries or to work on (machine) translation.
Conditions for Use
I. Intellectual property and conditions of use of data
The DGT-Acquis data is the exclusive property of the European Commission. The Commission cedes its non-exclusive rights free of charge and world-wide for the entire duration of the protection of those rights to the re-user, for all kinds of use which comply with the conditions laid down in the Commission Decision of 12 December 2011 on the re-use of Commission documents, published in Official Journal of the European Union L330 of 14 December 2011, pages 39 to 42.
Any re-use of the data or of the structured elements contained in it is required to be identified by the re-user, who is under an obligation to state the source of the documents used: the website address, the date of the latest update and the fact that the European Commission retains ownership of the data.
II. Conditions for use of software
The DGT-Acquis data is distributed with the software necessary for its exploitation/extraction. Use of such software must be carried out in accordance with the conditions laid down in the EUPL licence.
The data and the accompanying software are made available, without any guarantee, explicit or tacit. The Commission cannot be held responsible for any loss, injury or damage the re-user may suffer due to the re-use. The Commission does not however guarantee the absence of any irregularities which may be present in the data, within the structured data they contain or the software itself. The Commission does not guarantee the on-going distribution of said data and software.
The Commission cannot be held responsible for any loss, injury or damage caused to third parties as a result of the re-use. The re-user shall bear sole responsibility for the re-use of the data collection, the structured elements it contains and the software. Re-use must not mislead third parties in respect of the contents of the data and the structured elements it contains, it's the source of the contents or the date of the last update thereto. This disclaimer is not intended to limit the liability of the Commission in violation of any requirements laid down in applicable national law or to exclude its liability in cases where this is not permitted by the applicable law.
Definitions of terms used by the Commission Decision of 12 December 2011 on the re-use of Commission documents, published in Official Journal of the European Union L330 of 14 December 2011, pages 39 to 42, are supplemented by the following definitions:
Re-user: Any natural or legal person who re-uses the documents, in accordance with the conditions laid down in the Commission Decision of 12 December 2011 on the re-use of Commission documents, published in Official Journal of the European Union L330 of 14 December 2011, pages 39 to 42.
Databases: A collection of independent works, data or other materials arranged in a systematic or methodical way and individually accessible by electronic means or in any other way.
- Official Journal of the European Union
People who looked at this resource also viewed the following:
People who downloaded this resource also downloaded the following: