Heuristic algorithm for zero subject detection in Polish

This article describes a heuristic approach to zero subject detection in Polish. It focuses on the zero subject detection as a crucial step in end-to-end coreference resolution. The zero subject verbs are recognized using a set of manually created rules utilizing information from different sources, including: a dependency parser, a shallow relational parser and a valence dictionary. The rules were developed and evaluated on the Polish Coreference Corpus. The experimental results show that the presented method significantly outperforms the only machine learning-based alternative for Polish, i.e., MentionDetector. We also discuss and evaluate the importance of zero subject detection for existing coreference resolution tools for Polish.
Year:
2015
Type of Publication:
In Proceedings
Keywords:
Zero subject; Anaphora detection; Coreference resolution; Polish
Editor:
Pavel Král, Václav Matoušek
Volume:
9302
Book title:
Text, Speech, and Dialogue, 18th International Conference, TSD 2015, Pilsen,Czech Republic, September 14-17, 2015, Proceedings
Series:
Lecture Notes in Computer Science
Pages:
378-386
Month:
December
ISBN:
978-3-319-24032-9
ISSN:
978--3-31
DOI:
10.1007/978-3-319-24033-6_43
Hits: 5914