Evaluation

The evaluation was performed following the methodology proposed in Radziszewski and Acedański, 2012. Some additional details are also described in this dissertation (in Polish).

The procedure treats the whole tagger as a black box and the reported accuracy values include errors made at any possible level, including tokenisation errors, deficiencies of morphological analyser and unknown words tagged incorrectly.

The evaluation performed here was performed on the National Corpus of Polish (NKJP).

How to reproduce this? Read Evaluation_procedure

NKJP 1.0

This evaluation was performed using NKJP 1.0, using exactly the same data set and set-up as described there.

In this experiment WCRFT was evaluated in the older configuration (nkjp.ini), the newer configuration (nkjp_s2.ini) performs slightly better (more importantly, the new configuration requires less memory, so it is recommended).

Tagger Re-analysis Acc lower bound Acc upper bound Acc lower known Acc lower unknown
PANTERA no 88.79% 89.09% 91.08% 14.70%
YES 88.99% 89.28% 91.27% 14.74%
WMBT no guess no 87.50% 87.82% 89.78% 13.57%
YES 88.75% 89.08% 91.07% 13.62%
WMBT + guess no 88.44% 88.76% 89.89% 41.43%
YES 89.71% 90.04% 91.20% 41.45%
WCRFT YES 90.34% 90.67% 91.89% 40.13%

PANTERA stands for the morphosyntactic tagger based on Brill's Algorithm adapted for morphologically rich languages, using threshold of 6 (recommended by the author)
WMBT no guess corresponds to WMBT with no guessing (as descibed in the LTC'11 paper)
WMBT guess is the most recent version that includes guessing of unknown words
WCRFT is the tagger available on this site

NEW: below are evaluation results two newer configurations:
  • nkjp_s2.ini — best accuracy but large model and somewhat slow
  • nkjp_e2.ini — slightly worse accuracy but very small model and works faster
Tagger Re-analysis Acc lower bound Acc upper bound Acc lower known Acc lower unknown Full log
WCRFT nkjp_s2.ini yes 90.79% 91.12% 91.95% 53.17% r-wcrft-095-s2.txt
WCRFT nkjp_e2.ini yes 90.26% 90.58% 91.54% 48.52% r-wcrft-095-e2.txt

All the figures reported on this site have been obtained using Morfeusz SGJP (using Maca config morfeusz-nkjp).

NKJP 1.1

Using nkjp_s2.ini tagger configuration.

Morfeusz SGJP

Version: Dane lingwistyczne <2013/04/13>
Maca config: morfeusz-nkjp

Tagger Re-analysis Acc lower bound Acc upper bound Acc lower known Acc lower unknown
WCRFT yes 90.79% 91.13% 91.95% 53.08%

The same tagger config (nkjp_s2.ini) evaluated on NKJP 1.0 yielded 90.80% accuracy lower bound.

Morfeusz Polimorf

Version: Polimorf inflectional dictionary <2013/07/07>
Maca config: polimorf-nkjp

Tagger Re-analysis Acc lower bound Acc upper bound Acc lower known Acc lower unknown
WCRFT yes 90.70% 91.04% 91.78% 55.63%

r-wcrft-095-e2.txt Magnifier - Results for WCRFT 0.9.5, NKJP10, nkjp_e2.ini (7,808 KB) Adam Radziszewski, 11 kwi 2014 12:25

r-wcrft-095-s2.txt Magnifier - Results for WCRFT 0.9.5, NKJP10, nkjp_s2.ini (7,808 KB) Adam Radziszewski, 11 kwi 2014 12:25