Fextor is a tool for extracting features from the collections of texts. It is characterized by high flexibility, while maintaining the performance and simplicity.
Features are extracted from text snippets, defined according to the type of pointer (token, annotation or pair annotations). This allows the simultaneous generation of multiple features for a single document.
Defining new types of features can be done by implementing in python or using a description in wccl language.
Fextor supports two formats of corpora - Poliqarp and CCL. The extracted features are saved in CSV format, with the possibility of converting to a matrix format, for use in LexCSD package.
Extracted features can be used in:
- classiﬁcation of derivational relations
- recognition of inter-chunk syntac-tic relations and semantic relations between named entities
- word sense disambiguation
- anaphora resolution