All tests have been performed on the NKJP 1.0, divided into ten folds, using plain text comparison. Plain text is generated by converting each paragraph into a block of text terminated with two newlines (sentence division is not kept in plain text format).
Evaluation of Tillburg MBT¶The evaluation has been carried out under different set-ups:
- naive: only form/orth and whole tag
- simplified (poor) features: orth, class, nmb, gnd, cas
- full (rich) features: as in WMBT
- As MBT is unable to perform tokenisation, morfeusz-nkjp-guesser Maca config is used.
- To provide similar conditions when training and testing, reanalysis of training data is performed. Each training fold is subjected to reanalysis with
morfeusz-nkjp-guesser. If segmentation is subjected to any changes, the part is taken from the original training data (such cases are rare, while this is the simplest thing to do). Data in
- To do fair tests, each test part is first converted to plain text, then re-analysed with
morfeusz-nkjp-guesser. Note that test data contains no reference tagging (disamb lexemes), so it's easy always to take the new segmentation. This is important for the tests to be fair, and this is actually done. Data in
- As MBT output contains no information on (lack of) space between tokens, this must be restored for the tagger-eval script to run properly. Fortunately, this is easy, since this no-space info is available in the test data for the tagger, while tokenisation is the same there.
corpspacescript is used to copy no-space markers from
Naive (w/ remorph)¶
AVG weak corr lower bound 84.4177%
AVG weak corr upper bound 84.7424%
AVG weak corr lower bound 84.5917%
AVG weak corr upper bound 84.9164%
AVG weak corr lower bound 83.8511%
AVG weak corr upper bound 84.1758%