TaKIPI is a tagger of Polish language that is a tool which assigns morpho-syntactic markers to words in the text.
The tagger assumes a morpho-syntactic description of IPI PAN corpus tagset. Contextual disambiguation is carried out via a small set of hand-written rules and via a bigger number of rules automatically extracted by means of the algorithm of the induction of decision trees C4.5. During the process of tagger's learning and functioning, the context of each word's occurence in the text is represented as a feature vector of a constant length. Such vector is obtained by means of hand-written functional expressions of JOSKIPI formalism, which refer to morpho-syntactic properties of the context.
The software is available on GNU GPL 3.0. licence. It is a joint property of Institute of Informatics of Wrocław University of Technology and Institute of of Computer Science of the Polish Academy of Sciences. The tagger can be downloaded from the two sources :
- A package containing TaKIPI 1.8 in the source form (for Linux) and in the pre-compiled form (for Windows) is available on this page.
- The newest tagger sources are available on the repository: svn://nlp.pwr.wroc.pl/takipi/
The tagger is being developed under the Linux system and that is where it is mainly being tested, which is why we recommend using the Linux version . In the case of the Linux version we recommend testing the repository version (changes are cautiously introduced there)
A more detailed description, the possibility of viewing the code online and the place for reporting mistakes are available on the track page.