Disaster (DISAmbiguator and STatistical chunkER) is a Python module for chunking and morphosyntactic disambiguation. The module is being developed, currently the following functionality is available:

  • corpus I/O routines, including the XCES format (IPI PAN dialect) and simple extension to handle IOB chunk tags,
  • graphical chunk editor,
  • rudimentary curses-based morphosyntactic annotation editor (allows to alter disambs in tags, no possibility to add new tags),
  • re-implementatino of the TaKIPI tagger with customisable tagset,
  • re-implementation of the JOSKIPI formalism with customisable tagset and possibility of references to chunk annotations,
  • several simple chunking algorithms, including Christer Johansson's memory based chunker

The software package has been released under GNU GPL 3.0, available here.