How to install MACA on Ubuntu 11.10¶
This instruction explains how to install MACA with all the recommended packages and plugins on Ubuntu 11.10 (including Morfeusz SGJP and morphological guesser from TaKIPI suite).
Probably this should also work for newer versions of Ubuntu, although it hasn't been tested.
To proceed with the installation, open a terminal window (e.g. Alt+F2, enter
terminal and hit ENTER).
First, install the needed utils and libraries that are provided as Ubuntu packages:
sudo apt-get install build-essential cmake bison flex python-dev swig git subversion sudo apt-get install libicu-dev libboost1.42-all-dev libloki-dev libxml++2.6-dev libedit-dev libreadline-dev
Install Morfeusz SGJP. The simplest way is to use Bartosz Zaborowski's personal package archive (the other option is to get the sources or precompiled libraries from sgjp.pl/morfeusz):
sudo add-apt-repository ppa:bartosz-zaborowski/nlp
Confirm the action with ENTER. Let the system update its list of available packages and install Morfeusz SGJP.
sudo apt-get update sudo apt-get install morfeusz-sgjp
Install Corpus1 package from TaKIPI suite that contains the morphological guesser (this step is optional, needed only for MACA configurations that will use the guesser):
# Corpus1 library is hosted on our SVN repository svn co svn://nlp.pwr.wroc.pl/takipi/trunk/Corpus mkdir Corpus/bin cd Corpus/bin cmake .. make sudo make install cd ../..
Then, use git to obtain a copy of Corpus2, Toki and Maca repositories:
git clone http://nlp.pwr.wroc.pl/corpus2.git git clone http://nlp.pwr.wroc.pl/toki.git git clone http://nlp.pwr.wroc.pl/maca.git
Install Corpus2, a library with all the basic data structures and I/O routines:
mkdir corpus2/bin cd corpus2/bin cmake .. make # or make -j7 sudo make install
NOTE: on a multi-processor machine, consider using make -j7 or so instead of make for all the builds (7 means 7 processes will be run at a time, speeding up the compilation).
Install Toki, our configurable tokeniser and sentence splitter.
cd ../.. mkdir toki/bin cd toki/bin cmake .. make sudo make install
Now install SFST 1.2 (Stuttgart Finite State Tools). A proper version is bundled with MACA:
cd ../../maca/third_party/SFST-1.2/SFST/src/ make sudo make install
Now install MACA itself
cd ../../../.. mkdir bin cd bin cmake .. make sudo make install
Update system's knowledge about the installed shared libraries:
Testing¶If no problems have occurred during the course of installation, you should be able to get fully functional MACA equipped with Morfeusz support. If you plan to use Morfeusz, please use one of the two MACA configurations:
morfeusz-nkjp-official— uses Morfeusz and ouputs in the NKJP tagset (slightly different from the internal Morfeusz SGJP tagset, reduced set of grammatical genders, as used in the National Corpus of Polish)
morfeusz-nkjp-official-guesser— as above, but forms unknown to Morfeusz will be processed with TaKIPI's guesser.
sgjp-official— uses Morfeusz and outputs in its original tagset (no guesser). This is not recommended unless you consciously want the original SGJP tagset (most tools are compatible with NKJP, not SGJP tagset).
The following commands are examples that should work:
echo "Jeśli poświęcisz wolność, by zyskać bezpieczeństwo, stracisz oba." | maca-analyse -qs sgjp-official -o xces > out.xml maca-convert morfsgjp2kipi < out.xml > out-kipi.xml echo "Nibykotek ma nibynóżki. Wygógluj sam." | maca-analyse -qs morfeusz-nkjp-official-guesser -o xces > out-guesser.xml