WMBT is a morphosyntactic tagger combining tiered tagging and Memory-Based Learning.

The tagger is suited for positional tagsets: for each tagset attribute a separate case base is gathered.

WMBT has been implemented in Python, although low-level routines are based on the following C++ libraries:

  • TiMBL, a popualar MBL framework,
  • WCCL, a toolkit for generation of morphosyntactic features,
  • Corpus2, a framework for dealing with annotated corpora and configurable tagsets.

WMBT itself is a disambiguation engine; to tag plain text, please use MACA first.

A detailed description (also on how to use MACA with WMBT), pointer to sources (GPL) and installation instructions may be found on the project site.