historical linguistics


VU-DNC: VU Diachronic Newspaper Corpus


VU-DNC is a unique diachronic corpus of Dutch newspaper articles from five major Dutch newspapers from 1950/1951 and 2002 (2 MW). The VU-DNC has been annotated for quotations, which enables the researcher to differentiate between the words directly under responsibility of the journalist.


SHEBANQ: System for HEBrew Text: ANnotations for Queries and Markup


A web application that enables researchers to perform linguistic queries on the WIVU Hebrew Text Database and preserve significant results as annotations to this resource. This database contains the Hebrew text of the Old Testament enriched with many linguistic features at the morpheme level up to the discourse level.


Online dictionary (ancient) Greek - Dutch for the letter Pi. Search functions include searches for Greek lemmata; search of Greek declined or conjugated word-forms that lead to the correct lemma (‘lemmatizer’); searches for Dutch words leading to different Greek lemmata; etymological searches. The dictionary is linked to Logeion, the international website of Greek dictionaries at the University of Chicago. The developers estimate that a complete version of the dictionary will be finished by the end of 2016 and that it will be published by the end of 2017.

Huygens ING

CLARIN B centre
Huygens Institute for Netherlands History aims to make the expertise of humanities researchers collaborate closely with specialists in e-Humanities.It consciously sustains traditional humaniora expertise and accommodate historians and textual scholars for all periods. It is the largest humanities research institute in the Netherlands and part of the KNAW and arose from the combination of the Huygens Institute and the Institute for the History of the Netherlands (ING).


In COAVA two sets of databases are made available in a standardized way: one with historical dialect data (the databases WBD and WLD with lexical data of the Brabantish and Limburgian dialect between 1880-1980) and one with first language acquisition data (four databases form the CHILDES project). The databases contain linguistic information (dialect form, standardised form (“Dutchified”), lexical meaning), geographical information (locality, dialect area, province) and information on the source (inquiry forms or monotopic dictionaries and the date of documentation). The visualisation of the first two sets of information will lead to lexical maps. The most typical way for the user to get to the data will be with the use of the browsable concept taxonomy. The databases are, in other words, approachable via search tools but also via a thematic taxonomy. This taxonomy was developed for the dialect databases and covers the general vocabulary.


INPOLDER (Integrated Parser and Lemmatizer of Dutch in Retrospect) provides a tool that assigns morphological tagging, lemmatization, and syntactic parsing for historical Dutch texts. It is built on the Adelheid tool (tagging and lemmatization) and Collins-Bikel statistical Parser.


With this web-application an end user can have historical Dutch texts tokenized, lemmatized and part-of-speech tagged, using the most appropriate resources (such as lexica) for the text in question. For each specific text, the user can select the best resources from those available in CLARIN, wherever they might reside, and where necessary supplemented by own lexica.