SuperMatrix is a system to support automatic extraction of semantic relations, based on the analysis of large text corpora. System was developed as a tool for expansion of Polish wordnet (Słowosieć).Expansion consist of two steps: system suggests a potential links between lexical units. Linguist verify these suggestions and decide which form will go to wordnet. This speeded up the work and preserve the integrity of data entry.
The system analyzes contexts of lexical units, and on this basis computes value of different measures of lexical similarity between lexical units.
Advantages of system:
- modular structure
- handling large matrices
- choice between various measures of similarity between vectors, and methods of matrix transformation.
- built-in evaluation module of measures of similarity (Wordnet-Based Synonymy Test)
- efficient implementation of rare matrix
- able to write matrix in CLUTO, CCS, CRS formats
- integration with WCCL formalism, which allows to reference to morphological and syntactic features of text.
SuperMatrix has been released on GPL licence. You can download the system via git clone command:
$ git clone http://nlp.pwr.wroc.pl/supermatrix.git
A description of SuperMatrix can be found in the following publications:
- Broda, Bartosz, Maciej Piasecki. 2008. SuperMatrix: a General Tool for Lexical Semantic Knowledge Acquisition. In Speech and Language Technology, 239-254. Polish Phonetics Assocation.
- Broda, Bartosz, Maciej Piasecki. 2011. Parallel, Massive Processing in SuperMatrix -- a General Tool for Distributional Semantic Analysis of Corpora. International Journal of Data Mining, Modelling and Management.