LexCSD (Lexicographer-Controlled Semi-automatic Sense Disambiguation) is a system for semi-automatic  sense disambiguation. The algorithm is based on clustering of text snippets including words in focus. Each group can be optionally, manually labeled based on automatically extracted representative examples of the use of meanings. The results (stored in matrix format) are used to construct the classifier. Classifier can be applied to disambiguate previously unseen text.

The system is divided into modules:

  • ltcore - package containing the structure of the matrix format and the operations performed on it
  • ltcluster - package designed to clustering contexts and generating examples of use
  • ltlearn -  package designed to classify, allows the use of exteranl tools (weka, shogun)

 

 

LexCSD allows to:

  • contexts clusteing
  • automatic selection of the best clustering algorithm
  • showing examples of use for each of the sense
  • entry labels of sense by user
  • wide range of classifiers
System will be released soon on GNU GPL.