Sentence Alignment Setup » History » Version 8
« Previous -
Version 8/12
(diff) -
Next » -
Current version
Prokopis Prokopidis, 2016-02-05 01:37 PM
Sentence Alignment Setup¶
For the generation of sentence alignments from bilingual crawls, ILSP-FC integrates the java sentence aligner provided at http://align.sourceforge.net/. Alternatively you can use an external aligner like hunalign.
For example, for the current version of ILSP-FC, you can
- download the hunalign-1.2 source code from http://mokk.bme.hu/en/resources/hunalign/
- follow the instructions on the hunalign page for building hunalign
- put the hunalign directory containing the hunalign executable next to the runnable ilsp-fc jar.
For example, if you run ilsp-fc from:
~/ilsp-fc/ilsp-fc-2.2-jar-with-dependencies.jar
you should do the following
cd ~/ilsp-fc/ wget ftp://ftp.mokk.bme.hu/Hunglish/src/hunalign/latest/hunalign-1.2.tgz tar xvfz hunalign-1.2.tgz cd hunalign-1.2/src/hunalign/ make ln -sf hunalign-1.2 hunalign
This should create hunalign/src/hunalign/hunalign with the suggested hunalign directory structure, including
~/ilsp-fc/hunalign/data/ ~/ilsp-fc/hunalign/src/hunalign/hunalign
Now, you are ready to produce TMX files from bilingual crawled data using the -align, -dict, -oft and -ofth options described in the Getting Started part of the documentation.