How to install MACA on Ubuntu 11.10

This instruction explains how to install MACA with all the recommended packages and plugins on Ubuntu 11.10 (including Morfeusz SGJP and morphological guesser from TaKIPI suite).

Probably this should also work for newer versions of Ubuntu, although it hasn't been tested.

Detailed instructions

To proceed with the installation, open a terminal window (e.g. Alt+F2, enter terminal and hit ENTER).

First, install the needed utils and libraries that are provided as Ubuntu packages:

sudo apt-get install build-essential cmake bison flex python-dev swig git subversion
sudo apt-get install libicu-dev libboost1.42-all-dev libloki-dev libxml++2.6-dev libedit-dev libreadline-dev

Install Morfeusz SGJP. The simplest way is to use Bartosz Zaborowski's personal package archive (the other option is to get the sources or precompiled libraries from sgjp.pl/morfeusz):

sudo add-apt-repository ppa:bartosz-zaborowski/nlp

Confirm the action with ENTER. Let the system update its list of available packages and install Morfeusz SGJP.
sudo apt-get update
sudo apt-get install morfeusz-sgjp

Install Corpus1 package from TaKIPI suite that contains the morphological guesser (this step is optional, needed only for MACA configurations that will use the guesser):

# Corpus1 library is hosted on our SVN repository
svn co svn://nlp.pwr.wroc.pl/takipi/trunk/Corpus
mkdir Corpus/bin
cd Corpus/bin
cmake ..
make
sudo make install
cd ../..

Then, use git to obtain a copy of Corpus2, Toki and Maca repositories:

git clone http://nlp.pwr.wroc.pl/corpus2.git
git clone http://nlp.pwr.wroc.pl/toki.git
git clone http://nlp.pwr.wroc.pl/maca.git

Install Corpus2, a library with all the basic data structures and I/O routines:

mkdir corpus2/bin
cd corpus2/bin
cmake ..
make # or make -j7
sudo make install

NOTE: on a multi-processor machine, consider using make -j7 or so instead of make for all the builds (7 means 7 processes will be run at a time, speeding up the compilation).

Install Toki, our configurable tokeniser and sentence splitter.

cd ../..
mkdir toki/bin
cd toki/bin
cmake ..
make
sudo make install

Now install SFST 1.2 (Stuttgart Finite State Tools). A proper version is bundled with MACA:

cd ../../maca/third_party/SFST-1.2/SFST/src/
make
sudo make install

Now install MACA itself

cd ../../../..
mkdir bin
cd bin
cmake ..
make
sudo make install

Update system's knowledge about the installed shared libraries:

sudo ldconfig

Testing

If no problems have occurred during the course of installation, you should be able to get fully functional MACA equipped with Morfeusz support. If you plan to use Morfeusz, please use one of the two MACA configurations:
  1. morfeusz-nkjp-official — uses Morfeusz and ouputs in the NKJP tagset (slightly different from the internal Morfeusz SGJP tagset, reduced set of grammatical genders, as used in the National Corpus of Polish)
  2. morfeusz-nkjp-official-guesser — as above, but forms unknown to Morfeusz will be processed with TaKIPI's guesser.
  3. sgjp-official — uses Morfeusz and outputs in its original tagset (no guesser). This is not recommended unless you consciously want the original SGJP tagset (most tools are compatible with NKJP, not SGJP tagset).

The following commands are examples that should work:

echo "Jeśli poświęcisz wolność, by zyskać bezpieczeństwo, stracisz oba." | maca-analyse -qs sgjp-official -o xces > out.xml
maca-convert morfsgjp2kipi < out.xml > out-kipi.xml
echo "Nibykotek ma nibynóżki. Wygógluj sam." | maca-analyse -qs morfeusz-nkjp-official-guesser -o xces > out-guesser.xml