TDS: Typological Database SystemSummary
The Typological Database System (TDS) is a web-based service that provides integrated access to a collection of independently developed typological databases. Unified querying is supported with the help of an integrated ontology. The component databases of the TDS are cross-linguistic databases, developed for research in language typology and linguistics. Together they contain some 1200 different descriptive properties, with information about more than 1000 languages. (Because of the heterogeneous nature of the collection, most properties are only filled for a fraction of the languages). Most of the data is in the form of high-level "analytical" properties, but there are also a few collections of example sentences (with glosses) illustrating particular phenomena.Background
Language typology, the study of the range of language variation and universals, is a data-intensive discipline that increasingly relies on electronic databases. Improved availability of the data collected in the TDS enhances its potential to support linguistic research.
The TDS can be used to help answer questions such as "which languages have the basic word order Verb-Object-Subject", "what kind of phonological stress systems are common" "are languages with subject-verb agreement more likely to allow null subjects than languages without it" etc. The system is not an oracle: In all cases, only partial information is returned, as collected and deposited in the system by the creators of the component databases. But this information can be invaluable to other researchers, either as a complete answer to a specific question or as the starting point for further research.
Given that the collected data represents linguistic analysis and often novel theoretical approaches, it is impossible to map it to a single "consensus" standard. While in some limited cases it is possible to completely reconcile data from different sources, the system places a premium on preserving the theoretical orientations and analyses of the component databases, which are presented side by side as alternative datasets in the same topical group.
The TDS project was carried out by a research group of the Netherlands Graduate School of Linguistics (LOT), with members representing the University of Amsterdam, Leiden University, Radboud University Nijmegen, and Utrecht University. It was developed with support from NWO (Netherlands Organization for Scientific Research) grant 380-30-004 / INV-03-12 and from participating universities. The initial phase of the project was started in September 2000, and the project entered the implementation phase on 1 May 2004. Originally scheduled to run for three years, it was extended until 31 December 2007. The TDS server and data collections continued to be augmented until 2009. While the original TDS web server is still operational, web technologies evolve rapidly. The system had begun to show its age even before the end of the project in 2009, motivating migration of the data collection to an archival platform. But due to the complexity and diversity of the component databases, the data cannot be usefully navigated without specialized supporting software; useful archiving necessitates a software access point alongside the static data. Under the "TDS Curator" project, supported by a CLARIN-NL Call 1 grant, the TDS has migrated to a new platform, hosted by the Data Archiving and Networked Services (DANS), that conforms to CLARIN infrastructural requirements. Both versions of the system remain in operation.
- Project leader: dr. Alexis Dimitriadis (Utrecht University)
- CLARIN center: Data Archiving and Networked Services (DANS)
- Help contact : firstname.lastname@example.org
- Web-sites: http://languagelink.wp.hum.uu.nl/typological-database-system
- User scenario's (screencasts, screenshots): n.a.
- Manual: http://languagelink.let.uu.nl/tds/main.html#tutorial%5B1%5D (for original TDS server)
- Tool/Service link:
- A. Dimitriadis, M. Windhouwer, A. Saulwick, R. Goedemans, T. Bíró. How to integrate databases without starting a typology war: The Typological Database System. In S. Musgrave, M. Everaert and A. Dimitriadis (eds.), The use of databases in cross-linguistic research, Mouton de Gruyter, March 2009.
- M. Windhouwer, A. Dimitriadis. Sustainable operability: Keeping complex resources alive. In Proceedings of the LREC workshop on Sustainability of Language Resources and Tools for Natural Language Processing (SustainableNLP08 ), Marrakech, Morocco, May 31, 2008.
- A. Dimitriadis. Managing Differences: The TDS Approach. In Proceedings of the E-MELD Workshop on Toward the Interoperability of Language Resources (E-MELD 2007 ), Stanford, CA, July 13-15, 2007. Position paper.
- A. Dimitriadis, A. Saulwick, M. Windhouwer. Semantic relations in ontology mediated linguistic data integration. In Proceedings of the E-MELD Workshop on Morphosyntactic Annotation and Terminology: Linguistic Ontologies and Data Categories for Linguistic Resources (E-MELD 2005 ), Cambridge, Massachusetts, July 1-3, 2005.
- A. Saulwick, M. Windhouwer, A. Dimitriadis, R. Goedemans. Distributed tasking in ontology mediated integration of typological databases for linguistic research. In J. Castro and E. Teniente, Proceedings of the CAiSE'05 Workshops (International Workshop on Data Integration and the Semantic Web (DISWeb'05) in conjuction with CAiSE'05 ), Volume I, pp 303-317, Porto, Portugal, June 14, 2005.
- A. Dimitriadis, P. Monachesi. Integrating Different Data Types in a Typological Database System. In P. Austin, H. Dry and P. Wittenburg (eds.), Proceedings of the International Workshop on Resources and Tools in Field Linguistics, Las Palmas, Canary Islands, Spain, 2002.
- P. Monachesi, A. Dimitriadis, R. Goedemans, A. Mineur, M. Pinto. A Unified System for Accessing Typological Databases. In Proceedings of the Third International Conference on Language Resources and Evaluation (LREC 3), Las Palmas, Canary Islands, Spain, 2002.
- P. Monachesi, A. Dimitriadis, R. Goedemans, A. Mineur, M. Pinto. The Typological Database System. In S. Bird, P. Buneman and M. Liberman (eds.), Proceedings of the IRCS Workshop on Linguistic Databases, pp 181-186, Philadelphia, 2001