@philosTEI - TICCLing Philosophy


An open source, web-based, user-friendly workflow from digital images of text to TEI, which enables building digital philosophy corpora. This workflow uses a combination of an OCRopus / Tesseract webservice for text layout analysis and Optical Character Recognition (OCR) and a multilingual version of TICCL available as webservice TICCLops.


Philosophy and philosophy-informed intellectual history can profit immensely from teaming up with computer scientists and applying computational methods. A new empirical, datadriven and collaborative computational methodology can scale up our research and help making it more efficient and objective. The tool is able to build appropriate high quality, easily accessible, large-scale corpora of multi-script texts in a sustainable format (TEI) which are multi-language and from different historical periods in a reliable, efficient, user-friendly, sustainable and cost-cutting way.

This is demonstrated on a multi-lingual, multi-script corpus of 18th-20th century philosophical texts. This test case corpus contains at least the following books and articles (maybe more will follow):

  • Wolff (1740) (Latin)
  • Bolzano (1837) (German)
  • Frege (1879) (German)
  • Tarski (1936) (Polish)
  • Project leader: 
Dr. Arianna Betti (VU University Amsterdam)
  • CLARIN center: Huygens ING
  • Help contact
: t.b.a.
  • Twitter: https://twitter.com/philostei

Research domain


Resource tags

Tool task

CLARIN centre