Project

General

Profile

Sentence Alignment Setup » History » Version 11

« Previous - Version 11/12 (diff) - Next » - Current version
Prokopis Prokopidis, 2016-02-12 03:30 PM


Sentence Alignment Setup

For the generation of sentence alignments from bilingual crawls, ILSP-FC integrates the java sentence aligner provided at https://github.com/loomchild/maligna/.

Alternatively you can use an external aligner like hunalign. For example, for the current version of ILSP-FC, you can

  • download the hunalign-1.2 source code from http://mokk.bme.hu/en/resources/hunalign/
  • follow the instructions on the hunalign page for building hunalign
  • put the hunalign directory containing the hunalign executable next to the runnable ilsp-fc jar.

For example, if you run ilsp-fc from:

~/ilsp-fc/ilsp-fc-x.x-jar-with-dependencies.jar

you should do the following

cd ~/ilsp-fc/
wget ftp://ftp.mokk.bme.hu/Hunglish/src/hunalign/latest/hunalign-1.2.tgz
tar xvfz hunalign-1.2.tgz
cd hunalign-1.2/src/hunalign/
make
ln -sf hunalign-1.2 hunalign

This should create hunalign/src/hunalign/hunalign with the suggested hunalign directory structure, including

~/ilsp-fc/hunalign/data/ 
~/ilsp-fc/hunalign/src/hunalign/hunalign

Now, you are ready to produce TMX files from bilingual crawled data using the -align, -dict, -oft and -ofth options described in the Getting Started part of the documentation.