TTNWW - TST Tools voor het Nederlands als Webservices in een WorkflowSummary
TTNWW integrates and makes available existing Language Technology (LT) software components for the Dutch language that have been developed in the STEVIN and CGN projects. The LT components (for text and speech) are made available as web-services in a simplified workflow system that enables researchers without much technical background to use standard LT workflow recipes.
The web services are available in two separate domains: "Text" and "Speech" processing. For "Text", workflows for the following functionality is offered by TTNWW:
- Orthographic Normalisation using TICCLops (version CLARIN-NL 1.0)
- Part of Speech Tagging, Lemmatisation, Chunking, limited Multiword Unit Recognition, and Grammatical Relation Assignment by Frog (Version 012.012)
- Syntactic Parsing (including grammatical relation assignment, limited named entity recognition, and limited multiword unit recognition) by the Alpino Parser (version 1.3)
- Semantic Annotation
- Named Entity Recognition
- Co-reference Assignment
- Automatic Transcription of speech files using a Netherlands Dutch acoustic model
- Automatic Transcription of speech files using a Flemish Dutch acoustic model
- Conversion of the input speech file to the required sampling rate, followed by automatic transcription
The architecture of the TTNWW portal consists out of several components and follows the principles of Service Oriented Architecture (SOA). The TTNWW GUI front-end is a Flex module that communicates with the TTNWW web-application which keeps track of the different sessions and knows which LT recipes are available. TTNWW communicates assigments (workflow specifications) to the WorkflowService that evaluates the requested workflow and requests the DeploymentSevice to start the required LT web-services. After initialization of the LT web-services, the workflow specification is sent to the Taverna Server, that takes further care of the workflow.
To facilitate the process of wrapping applications that were originally designed as standalone applications into web services, the CLAM (Computational Linguistics Application Mediator) wrapper software allows for easy and transparent transformation of applications into RESTful web services. The CLAM software has extensively been used in the TTNWW project for both text and speech processing tools. With the exception of Alpino and MBSRL all web services work operate on CLAM wrappers.
Given the number of web services involved in the TTNWW project and possibilities offered by the cloud environment the preferred method of delivering the web service installations was delivery of complete virtual machine images by the LT providers. These could be directly uploaded into the cloud environment and thus relieving the CLARIN centres nd LT providers from the original foreseen task of running the webservices themselves. A potential advantage of this method, that has not been exploited in the project yet, is that these images may be also be delivered directly to the end user so these can be run in a local configuration using virtualization software such as VMWare of VirtualBox.
The workflow engine used in the project was Taverna. But build on top of this was a a number of selectable task recipes, following a task oriented approach in line with the premises that users with no or little technical expertise should be able to use the system. In this context, tasks are understood in terms of end results of processes such as semantic role labelling, pos tagging or syntactic analysis and ready-made workflows are constructed that can be readily used by the end user. Contacts
- Project leader: Marc Kemps-Snijders (NL), Ineke Schuurman (VL)
- CLARIN center: Meertens Institute (portal) and others (web-services).
- Help contact : n.a.
- Web-sites: n.a.
- User scenario's (screencasts, screenshots): n.a.
- Manual: http://yago.meertens.knaw.nl/apache/TTNWW/assets/TTNWW.pdf
- Tool/Service link: http://yago.meertens.knaw.nl/apache/TTNWW/
- Publications: n.a.