Lista usług

Narzędzia podstawowe dla języka polskiego:

W podpunktach podano podstawowy potok przetwarzania w notacji LPMN

  • Morpho
    • any2txt|maca({"morfeusz2":true})
  • Tager
    • WCRFT2: any2txt|wcrft2({"guesser":false, "morfeusz2":true})
    • Morphodita: any2txt|morphoDita({"guesser":false, "allforms":false, "model":"XXI"})
  • Chunker
    • any2txt|wcrft2|iobber
  • Ner
    • any2txt|wcrft2|liner2({"model":"names"})
  • Serel
    • any2txt|wcrft2|liner2({"model":"5nam"})|serel
  • Spatial
    • any2txt|wcrft2({"morfeusz2":false})|liner2({"model":"names","output":"tei:gz"})|spatial
  • Spejd
    • any2txt|wcrft2({"morfeusz2":false})|iobber|liner2({"model":"n82","output":"tei:gz"})|spejd
  • Parser
    • any2txt|wcrft2|dependpar|out("conll_")|conll2svg|out("svg_")
  • WSD
    • any2txt|wcrft2({"morfeusz2":false})|wsd({"use_mwe":false})
  • Ner DE
    • any2txt|spacy({"annotate_entities":true,"lang":"de"})
  • Ner EN
    • any2txt|spacy({"annotate_entities":true})
  • Ner EN NLTK
    • any2txt|nltk({"annotate_entities":true})
  • Parser DE
    • any2txt|spacy({"method":"parser","lang":"de"})|out("conll_")|conll2svg|out("svg_")
  • Parser EN
    • any2txt|spacy({"method":"parser"})|out("conll_")|conll2svg|out("svg_")
  • Tager DE
    • any2txt|spacy({"lang":"de"})
  • Tager EN
    • any2txt|spacy
  • Tager EN NLTK
    • any2txt|spacy
  • Tager ML
    • any2txt|tagger({"lang":"polish"})
  • Inkluz
    • any2txt|inkluz
  • Respa
    • any2txt|wcrft2|respa
  • Sentyment
    • any2txt|wcrft2({"morfeusz2":true})|wsd|sentiment|out("senti")|sentimerge({"split_paragraphs":"False"})
  • Summarize
    • any2txt|wcrft2({"morfeusz2":false})|liner2({"model":"names"})|summarize
  • TermoPL
    • any2txt|wcrft2({"morfeusz2":false})|termopl
  • TF-IDF
    • any2txt|wcrft2|tfidf
  • WebSty
    • any2txt|wcrft2|fextor2({"features":"base interp_signs bigrams","base_modification":"startlist","orth_modification":"startlist","lang":"pl","filters":{"base":[{"type":"lemma_stoplist","args":{"stoplist":"@resources/fextor/nkjp360-meaningless-no-prep-freq-above-3500.txt"}}]}})|dir|out("output_fextor")|featfilt({"similarity":"cosine","weighting":"all:tf","filter":"min_tf-1 min_df-1"})|cluto({"no_clusters":2,"analysis_type":"plottree"})
  • WebStyML
    • any2txt|div(20000)|tagger({"lang":"polish"})|fextor2({"features":"base interp_signs bigrams","base_modification":"startlist","orth_modification":"startlist","lang":"ud","filters":{"base":[{"type":"lemma_stoplist","args":{"stoplist":"@resources/fextor/ml/polish_base_startlist.txt"}}]}})|dir|out("output_fextor")|featfilt({"similarity":"cosine","weighting":"all:tf","filter":"min_tf-1 min_df-1"})|cluto({"no_clusters":2,"analysis_type":"plottree"})
  • Topic
    • converter({"type":"topicmodel"})|dir|topicmodel({"min_df":2,"max_df":0.7,"no_topics":20,"method":"lda_gensim"})
  • Lem
    • any2txt|wcrft2({"morfeusz2":false})|converter({"type":"ccl2base"})|dir|shimext({"ext":".txt"})|makezip
  • MeWeX
    • any2txt|wcrft2({"morfeusz2":true})|mewex({"mewex_options":{"ranker_func":"vector_association_measure","num_kbest":"500","wccl_rels":["agr_noun_adj","all_burk_noun","all_ger_qub","sth_adjgen","gndnoun_adj","all_noun_noun","all_noun_self","ppron3gen_noun","all_num_noun","adj_noun_adj","noun_prep_noun"]}})

Techniczne:

  • dir - grupowanie danych do jednego katalogu
  • makezip - pakowanie (zip) danych

Konwersje:

  • any2txt
  • converter

Grupowanie:

  • cluto
  • ward

Wydobywanie informacji

Język angielski, niemiecki

  • spacy

Kontakt
Pytania i uwagi proszę kierować na maila: webserwisy ( at ) clarin-pl.eu