Named Entity Recognition in the Domain of Polish Stock Exchange Reports

The paper investigates the accuracy of a Named Entity Recognition (NER) algo-rithm based on the Hidden Markov Model in the domain of Polish stock exchange reports. The task of NER was limited to the recognition and classification of Named Entities representing persons and companies. The algorithm was tested on a small Polish domain corpus of stock exchange reports. A comparison with the baselines of the algorithms based on the case of the first letters and a gazetteer is presented. The algorithm outperformed both baselines; it achieved 64% precision and 93% recall for person names and 78% precision and 83% recall for company names. Introduction of simple hand-written post-processing rules increased the precision for person names up to 87%. A cross-domain evaluation on a small corpus of police reports is also presented. We discuss the problem of method portability in relation to much worse results obtained on the second corpus. A possible com-bination of different knowledge sources is sketched as a possible way of overcoming the portability problem.
Research areas:
Year:
2010
Type of Publication:
In Collection
Keywords:
named entities recognition; Named Entity Recognition
JRESEARCH_BOOK_TITLE:
Intelligent Information Systems
Publisher:
Publishing House of University of Podlasie
Editor:
Mieczysław A. Kłopotek and Małgorzata Marciniak and Agnieszka Mykowiecka and Wojciech Penczek and Sławomir T. Wierzchoń
Address:
Siedlce
Pages:
127-140
Note:
{\bf 5 pkt} (rozporządzenie MNiSW obowiązujące w roku 2010)
Hits: 597