a wordnet of Polish
The Polish wordnet – plWordNet – is a semantic network which reflects the Polish lexical system. The nodes in plWordNet are lexical units (words with their senses), variously interconnected by semantic relations from a well-defined relation set. For example, kot - 'cat' is a hyponym (subclass) of zwierzę - 'animal', pazur - 'claw' and łapa are related by meronymy (part/whole), while wchodzić - 'enter' and wychodzić - 'exit' are antonyms – all of them in specific senses. A lexical unit acquires its meaning from its relatedness to other lexical units within the system; we can reason about it by considering relations in which it participates. For example, kot - 'cat' is defined as a kind of zwierzę - 'animal', łapa - 'paw' as a whole of which pazur - 'claw' is a part, and the activities of wchodzenie - 'entering' and wychodzenie - 'exiting' as opposites.
A wordnet should be designed to facilitate automatic text analysis. It is, in fact, a fundamental language resource, indispensable in many types of work in Artificial Intelligence. Thanks to plWordNet, it will be easier to make computers learn to understand the Polish language.
The first ever wordnet (WordNet) was built in the late 1980s at Princeton University. In the past two decades, hundreds of research teams followed in the footsteps of WordNet's creators. The research group G4.19 at Wrocław University of Technology is one of those teams. Notably, plWordNet is one of few such resources built not by translating WordNet, but from the ground up, in a joint effort of lexicographers and computer scientists. In 2009 the first version, with some 27000 lexical units, has been made available on the Internet. Today plWordNet describes 178000 nouns, verbs, adjectives, and adverbs, contains nearly 259000 unique senses and over 600000 relation instances. It is by far the largest wordnet in the world. The leaders of Wrocław University of Technology have decided to make plWordNet available free of charge for any applications (including commercial applications) based on a licence modelled on that for Princeton WordNet.
To acquire plWordNet source files, please fill the registration form below. Users may browse plWordNet via mobile version and via WordNetLoom-Viewer (application enabling display of plWN entries). Programmers may access plWordNet via Web service.
The continued growth of plWordNet has been made possible by grants from the Polish Ministry of Science and Higher Education and from the European Union (a complete list – see below). We aim to build a conceptual dictionary fully representative of contemporary Polish, comparable with the largest wordnets in the world. This means, however, that plWordNet in its present shape is a work in progress; it is neither complete nor fault-free. While there is still much to do, we have made an effort to ensure that version 3.0 which becomes available now has the same high quality as the best wordnets out there – Princeton WordNet, EuroWordNet (a joint initiative of a dozen or so members of the European Union) or GermaNet from Tübingen University.
Projects devoted to the continued development of plWordNet:
- Automatic methods of constructing a semantic network of Polish lexemes for natural language processing (2005–2008), funded by the Polish Ministry of Science and Higher Education, No 3T11C01829,
- Construction of lexical resources with the help of recognition of semantic relations in text corpora on the basis of morpho-syntactic and semantic data (2009–2012), funded by the Polish Ministry of Science and Higher Education, No N N516 068637,
- NEKST — Adaptive system supporting solving problems on the basis of content analysis of electronic documents. (2010–2013), funded by European Union Innovative Economy Programme POIG.01.01.02-14-013/09,
- SyNaT — Research Task: “Construction of an open, repository hosting and communication platform for the network knowledge resources fro science, education and open knowledge society” (2010–2013), Strategic Project, funded by National Centre for Research and Development.
- CLARIN-PL - The Polish part of CLARIN ERIC Research Infrastructure: Common Language Resources & Technology Infrastructure, the completion of a construction phase (2013–2015), funded by the Polish Ministry of Science and Higher Education, No 6358/IA/119/2013.
plWordNet structure and co-ordination of the linguistic work:
Main investigators in the computational linguistics part:
AcknowledgementsWe would like to thank Professor Elżbieta Hajnicz from ICS PAS for useful comments and remarks.
e-mail: plwordnet_at_pwr_dot_wroc_dot_pl (substitute @ for _at_ and . for _dot_)
Rudnicka E., Maziarz M., Piasecki M., Szpakowicz S. (2012) Mapping plWordNet onto Princeton WordNet. (doc)
Maziarz M., Piasecki M., Szpakowicz S. (2012) Approaching plWordNet 2.0. Proceedings of the 6th Global Wordnet Conference, Matsue, 9-13th January, 2012, Japan (Accepted for publishing). (pdf)
Piasecki, Maciej, Szpakowicz, Stanisław, Bartosz Broda. A Wordnet from the Ground Up. Wroclaw : Oficyna Wydawnicza Politechniki Wroclawskiej, 2009. (pdf)
Other publicationsThe rest of our publications can be found on the G4.19 Research Group web page.
In order to download the application please fill the form below. After submitting the form an e-mail with a link will be sent on a given address.