Evaluation

All tests have been performed on the NKJP 1.0, divided into ten folds, using plain text comparison. Plain text is generated by converting each paragraph into a block of text terminated with two newlines (sentence division is not kept in plain text format).

Evaluation of Tillburg MBT

The evaluation has been carried out under different set-ups:
  1. naive: only form/orth and whole tag
  2. simplified (poor) features: orth, class, nmb, gnd, cas
  3. full (rich) features: as in WMBT
All the tests have been performed under these assumptions:
  1. As MBT is unable to perform tokenisation, morfeusz-nkjp-guesser Maca config is used.
  2. To provide similar conditions when training and testing, reanalysis of training data is performed. Each training fold is subjected to reanalysis with morfeusz-nkjp-guesser. If segmentation is subjected to any changes, the part is taken from the original training data (such cases are rare, while this is the simplest thing to do). Data in reana dir.
  3. To do fair tests, each test part is first converted to plain text, then re-analysed with morfeusz-nkjp-guesser. Note that test data contains no reference tagging (disamb lexemes), so it's easy always to take the new segmentation. This is important for the tests to be fair, and this is actually done. Data in testana dir.
  4. As MBT output contains no information on (lack of) space between tokens, this must be restored for the tagger-eval script to run properly. Fortunately, this is easy, since this no-space info is available in the test data for the tagger, while tokenisation is the same there. corpspace script is used to copy no-space markers from testana test files.

Naive (w/ remorph)

AVG weak corr lower bound 84.4177%
AVG weak corr upper bound 84.7424%

Poor

AVG weak corr lower bound 84.5917%
AVG weak corr upper bound 84.9164%

Rich

AVG weak corr lower bound 83.8511%
AVG weak corr upper bound 84.1758%