Adding one or more normalized orthographic representations, possibly with confidence scores, as properties of an actually occurring orthography, e.g to correct a spelling, to correct a spelling generated by optical character recognition (OCR) or handwriting recognition, for normalisation to modern spellings in historical documents ,etc.
TTNWW integrates and makes available existing Language Technology (LT) software components for the Dutch language that have been developed in the STEVIN and CGN projects. The LT components are made available as web-services in a simplified workflow system that enables researchers without much technical background to use standard LT workflow recipes. The web services are available in two separate domains: "Text" and "Speech" processing. The TTNWW services have been created in a Dutch and Flemish collaboration project building on the results of past Dutch and Flemish projects. The web services are partly deployed in the SURF-SARA BiG-Grid cloud or at CLARIN centres in the Netherlands and at CLARIN VL University partners.
TICCL (Text Induced Corpus Clean-up) is a system that is designed to search a corpus for all existing variants of (potentially) all words occurring in the corpus. This corpus can be one text, or several, in one or more directories, located on one or more machines. TICCL creates word frequency lists, listing for each word type how often the word occurs in the corpus. These frequencies of the normalized word forms are the sum of the frequencies of the actual word forms found in the corpus. TICCL is a system that is intended to detect and correct typographical errors (misprints) and OCR errors (optical character recognition) in texts.