Fextor is a tool for extracting features from the collections of texts. It is characterized by high flexibility, while maintaining the performance and simplicity.
Features are extracted from text snippets, defined according to the type of pointer (token, annotation or pair annotations). This allows the simultaneous generation of multiple features for a single document.
Defining new types of features can be done by implementing in python or using a description in wccl language.
Fextor supports two formats of corpora - Poliqarp and CCL. The extracted features are saved in CSV format, with the possibility of converting to a matrix format, for use in LexCSD package.
Documentation from the phase of requirements gathering¶
Most of the documentation below is for reference purposes for the development team. The documents were written mostly in Polish.
- Usage scenarios