Disaster (DISAmbiguator and STatistical chunkER) is a Python module for chunking and morphosyntactic disambiguation. The module is being developed, currently the following functionality is available:
- corpus I/O routines, including the XCES format (IPI PAN dialect) and simple extension to handle IOB chunk tags,
- graphical chunk editor,
- rudimentary curses-based morphosyntactic annotation editor (allows to alter disambs in tags, no possibility to add new tags),
- re-implementatino of the TaKIPI tagger with customisable tagset,
- re-implementation of the JOSKIPI formalism with customisable tagset and possibility of references to chunk annotations,
- several simple chunking algorithms, including Christer Johansson's memory based chunker
The software package has been released under GNU GPL 3.0, available here.