Named Entity Matching Method Based on the Context-Free Morphological Generator

Polish named entities are mostly out-of-vocabulary words, i.e. they are not described in morphological lexicons, and their proper analysis by Polish morphological analysers is difficult. The existing approaches to guessing unknown word lemmas and descriptions do not provide results on satisfactory level. Moreover, lemmatisation of multiword named entities cannot be solved by word-by-word lemmatisation in Polish. Multi-word named entity lemmas (e.g. included in gazetteers) often contain word forms that differ from lemmas of their constituents. Such multi-word lemmas can be produced only by tagger or parser-based lemmatisation. Polish is a language with rich inflection (rich variety of word forms), therefore comparing two words (even these which share the same lemma) is a difficult task. Instead of calculating the value of formbased similarity function between the text words and gazetteer entries, we propose a method which uses a context-free morphological generator, built on the top of the morphological lexicon and encoded as a set of in- flection rules. The proposed solution outperforms several state-of-the-art methods that are based on word-to-word similarity functions.
Year:
2014
Type of Publication:
In Proceedings
Keywords:
morphological generator; similarity of proper names; word similarity metric; Named Entity Recognition; information extraction
Editor:
Adam Przepiórkowski, Maciej Ogrodniczuk
Volume:
8686
Book title:
Advances in Natural Language Processing - 9th International Conference on NLP, PolTAL 2014, Warsaw, Poland, September 17-19, 2014. Proceedings
Series:
Lecture Notes in Computer Science
Pages:
34-44
Organization:
Springer International Publishing Switzerland
ISBN:
978-3-319-10888-9
ISSN:
0302-9743
Hits: 5643