CLARIN Data Formats and Archiving

CLARIN Data Formats and Archiving

CLARIN is committed on sharing efforts and when possible reusing data. To make this possible we need to make sure that the data that is produced in research is well documented, using CLARIN type metadata (CMDI) and is available in well understood CLARIN recommended data formats. For the CMDI metadata, please see the page on CMDI metadata. This page is to guide you on the issues related to choosing a suitable data format to store your data so that you can make use of CLARIN services to process the data and you and others can reuse the data later after archiving

With respect to data formats CLARIN prefers to use formats that are open. An open file format has a published specification usually maintained by a standards organization, which can therefore be used and implemented by anyone. For example, an open format can be implemented by both proprietary and free and open source software, using the typical software licenses used by each. In contrast to open formats, closed formats are considered trade secrets. Open formats are also called free file formats if they are not encumbered by any copyrights, patents, trademarks or other restrictions (for example, if they are in the public domain) so that anyone may use it at no monetary cost for any desired purpose.

Not all open formats are suitable though, there is also need for a broad support by tools and the expectation that in the near future it will remain current. These are necessary requirements for a format to become a CLARIN recommended or even required format. Discussions about such matters are conducted in the CLARIN standards committee [ref] where the CLARIN community considers all aspects.

Therefore CLARIN compatible data must be represented in one of a limited number of data formats. A browsable searchable overview can be found here
The official list of data formats and their status in CLARIN can be found on and advise via the CLARIN-NL website.

If, for a particular resource, none of these formats would be suited, contact the CLARIN-NL helpdesk for advice.

NOTE: parts of this text are from the Wikipedia.