Lista usług¶
Narzędzia podstawowe dla języka polskiego:¶
W podpunktach podano podstawowy potok przetwarzania w notacji LPMN
- Morpho
any2txt|maca({"morfeusz2":true})
- Tager
- WCRFT2:
any2txt|wcrft2({"guesser":false, "morfeusz2":true})
- Morphodita:
any2txt|morphoDita({"guesser":false, "allforms":false, "model":"XXI"})
- WCRFT2:
- Iobber
any2txt|wcrft2|iobber
- Liner2
any2txt|wcrft2|liner2({"model":"names"})
- Serel
any2txt|wcrft2|liner2({"model":"5nam"})|serel
- Spatial
any2txt|wcrft2({"morfeusz2":false})|liner2({"model":"names","output":"tei:gz"})|spatial
- Spejd
any2txt|wcrft2({"morfeusz2":false})|iobber|liner2({"model":"n82","output":"tei:gz"})|spejd
- Parser
any2txt|wcrft2|dependpar|out("conll_")|conll2svg|out("svg_")
- WSD
any2txt|wcrft2({"morfeusz2":false})|wsd({"use_mwe":false})
- Ner DE
any2txt|spacy({"annotate_entities":true,"lang":"de"})
- Ner EN
any2txt|spacy({"annotate_entities":true})
- Ner EN NLTK
any2txt|nltk({"annotate_entities":true})
- Parser DE
any2txt|spacy({"method":"parser","lang":"de"})|out("conll_")|conll2svg|out("svg_")
- Parser EN
any2txt|spacy({"method":"parser"})|out("conll_")|conll2svg|out("svg_")
- Tager DE
any2txt|spacy({"lang":"de"})
- Tager EN
any2txt|spacy
- Tager EN NLTK
any2txt|spacy
- Tager ML
any2txt|tagger({"lang":"polish"})
- Inkluz
any2txt|inkluz
- Respa
any2txt|wcrft2|respa
- Sentyment
any2txt|wcrft2({"morfeusz2":true})|wsd|sentiment|out("senti")|sentimerge({"split_paragraphs":"False"})
- Summarize
any2txt|wcrft2({"morfeusz2":false})|liner2({"model":"names"})|summarize
- TF-IDF
any2txt|wcrft2|tfidf
- WebSty
any2txt|wcrft2|fextor2({"features":"base interp_signs bigrams","base_modification":"startlist","orth_modification":"startlist","lang":"pl","filters":{"base":[{"type":"lemma_stoplist","args":{"stoplist":"@resources/fextor/nkjp360-meaningless-no-prep-freq-above-3500.txt"}}]}})|dir|out("output_fextor")|featfilt({"similarity":"cosine","weighting":"all:tf","filter":"min_tf-1 min_df-1"})|cluto({"no_clusters":2,"analysis_type":"plottree"})
- WebStyML
any2txt|div(20000)|tagger({"lang":"polish"})|fextor2({"features":"base interp_signs bigrams","base_modification":"startlist","orth_modification":"startlist","lang":"ud","filters":{"base":[{"type":"lemma_stoplist","args":{"stoplist":"@resources/fextor/ml/polish_base_startlist.txt"}}]}})|dir|out("output_fextor")|featfilt({"similarity":"cosine","weighting":"all:tf","filter":"min_tf-1 min_df-1"})|cluto({"no_clusters":2,"analysis_type":"plottree"})
- Topic
any2txt|div(20000)|wcrft2|fextor2({"features":"base","lang":"pl","filters":{"base":[{"type":"pos_stoplist","args":{"stoplist":["subst"]},"excluding":false}]}})|dir|feature2({"filter":{"base":{"min_df":2,"max_df":1,"keep_n":1000}}})|topic3({"no_topics":20,"no_passes":2,"method":"lda_mallet", "topic_scaling": "mmds"})
- Lem
any2txt|wcrft2({"morfeusz2":false})|converter({"type":"ccl2base"})|dir|shimext({"ext":".txt"})|makezip
- MeWeX
any2txt|wcrft2({"morfeusz2":true})|mewex({"mewex_options":{"ranker_func":"vector_association_measure","num_kbest":"500","wccl_rels":["agr_noun_adj","all_burk_noun","all_ger_qub","sth_adjgen","gndnoun_adj","all_noun_noun","all_noun_self","ppron3gen_noun","all_num_noun","adj_noun_adj","noun_prep_noun"]}})
Techniczne:¶
- dir - grupowanie danych do jednego katalogu
- makezip - pakowanie (zip) danych
Konwersje:¶
- any2txt
- converter
Grupowanie:¶
- cluto
- ward
Wydobywanie informacji¶
Język angielski, niemiecki¶
- spacy
Kontakt
Pytania i uwagi proszę kierować na maila: webserwisy ( at ) clarin-pl.eu