Activity

From 05 Dec 2011 to 03 Jan 2012

03 Jan 2012

09:44 Zadanie #3561 (Zamknięty): Clean-up configs & wiki description
Adam Radziszewski
09:38 Zadanie #3584 (Odrzucony): Add setup.py and path seeking for configs
Currently no installation script, need to provide full paths for tagger executable and configs Adam Radziszewski
09:36 Zadanie #3353 (Zamknięty): Treatment of unknown words
Final solution: gathering tags from tokens marked "unknown" (+ign), usning this closed list first for unknown words w... Adam Radziszewski

02 Jan 2012

11:31 WMBT is able to tag unknown words
The previous version of WMBT could not recover tags for tokens where morph analyser failed. The current version (as i... Adam Radziszewski

29 Dec 2011

11:20 Zadanie #3561 (Zamknięty): Clean-up configs & wiki description
Leave only configs with affix features.
Configs for no guessing, for guessing (kipi+nkjp)
rm old configs: *-1, *-...
Adam Radziszewski

21 Dec 2011

16:33 Zadanie #3353: Treatment of unknown words
-A works more-or-less ok (outputs loads of possible tags, but is correct), training in progress Adam Radziszewski
16:17 Zadanie #3353: Treatment of unknown words
Guessing seems done. New default config (unktagfreq=3) yields... Adam Radziszewski

14 Dec 2011

15:23 Zadanie #3353: Treatment of unknown words
First is more universal, makes sense to use affixes & regexes as features to get sort of morpho analysis.
Adam Radziszewski
15:21 Zadanie #3353 (Zamknięty): Treatment of unknown words
Two options here:
1. Find hapaxes, train a model for them, add completely guess new tags instead of igns.
2. Gather...
Adam Radziszewski
15:19 Zadanie #3352 (Zamknięty): Extend with CRF++
Add secondary implementation for CRF++. Note that it needs feature description files and different feature vectors (C... Adam Radziszewski
 

Also available in: Atom