From 05 Dec 2011 to 03 Jan 2012
03 Jan 2012
- 09:44 Zadanie #3561 (Zamknięty): Clean-up configs & wiki description
- 09:38 Zadanie #3584 (Odrzucony): Add setup.py and path seeking for configs
- Currently no installation script, need to provide full paths for tagger executable and configs
- 09:36 Zadanie #3353 (Zamknięty): Treatment of unknown words
- Final solution: gathering tags from tokens marked "unknown" (+ign), usning this closed list first for unknown words w...
02 Jan 2012
- 11:31 WMBT is able to tag unknown words
- The previous version of WMBT could not recover tags for tokens where morph analyser failed. The current version (as i...
29 Dec 2011
- 11:20 Zadanie #3561 (Zamknięty): Clean-up configs & wiki description
- Leave only configs with affix features.
Configs for no guessing, for guessing (kipi+nkjp)
rm old configs: *-1, *-...
21 Dec 2011
- 16:33 Zadanie #3353: Treatment of unknown words
- -A works more-or-less ok (outputs loads of possible tags, but is correct), training in progress
- 16:17 Zadanie #3353: Treatment of unknown words
- Guessing seems done. New default config (unktagfreq=3) yields...
14 Dec 2011
- 15:23 Zadanie #3353: Treatment of unknown words
- First is more universal, makes sense to use affixes & regexes as features to get sort of morpho analysis.
- 15:21 Zadanie #3353 (Zamknięty): Treatment of unknown words
- Two options here:
1. Find hapaxes, train a model for them, add completely guess new tags instead of igns.
- 15:19 Zadanie #3352 (Zamknięty): Extend with CRF++
- Add secondary implementation for CRF++. Note that it needs feature description files and different feature vectors (C...
Also available in: Atom