Sentence Alignment Setup » History » Version 6
Version 5 (Prokopis Prokopidis, 2014-08-19 12:52 PM) → Version 6/12 (Prokopis Prokopidis, 2014-09-09 01:55 PM)
h1. Sentence Alignment Setup
(Linux only)
In order to get sentence alignments as the output of bilingual crawls, ILSP-FC integrates the java sentence aligner provided at http://align.sourceforge.net/. Alternatively you can use an external aligner like hunalign.
is required. For example, for the current version of ILSP-FC, you can
* download the hunalign-1.2 hunalign-1.1 source code from http://mokk.bme.hu/en/resources/hunalign/
* follow the instructions on the hunalign page for building hunalign
* put the hunalign directory containing the hunalign executable next to the runnable ilsp-fc jar.
For example, if you run ilsp-fc from:
<pre>~/ilsp-fc/ilsp-fc-2.2-jar-with-dependencies.jar</pre>
you should do the following have a hunalign dir
<pre>
cd ~/ilsp-fc/
wget ftp://ftp.mokk.bme.hu/Hunglish/src/hunalign/latest/hunalign-1.2.tgz
tar xvfz hunalign-1.2.tgz
cd hunalign-1.2/src/hunalign/
make
ln -sf hunalign-1.2 hunalign
</pre> <pre>~/ilsp-fc/hunalign-1.1/</pre>
This should create hunalign/src/hunalign/hunalign with the suggested hunalign directory structure, including
<pre>~/ilsp-fc/hunalign/data/ <pre>~/ilsp-fc/hunalign-1.1/dict/
~/ilsp-fc/hunalign/src/hunalign/hunalign</pre> ~/ilsp-fc/hunalign-1.1/linux/src/hunalign/hunalign</pre>
Now, you are ready to produce TMX files from bilingual crawled data using the <code>-align</code>, <code>-dict</code>, <code>-oft</code> and <code>-ofth</code> options described in the [[GettingStarted|Getting Started]] part of the documentation.
(Linux only)
In order to get sentence alignments as the output of bilingual crawls, ILSP-FC integrates the java sentence aligner provided at http://align.sourceforge.net/. Alternatively you can use an external aligner like hunalign.
is required. For example, for the current version of ILSP-FC, you can
* download the hunalign-1.2 hunalign-1.1 source code from http://mokk.bme.hu/en/resources/hunalign/
* follow the instructions on the hunalign page for building hunalign
* put the hunalign directory containing the hunalign executable next to the runnable ilsp-fc jar.
For example, if you run ilsp-fc from:
<pre>~/ilsp-fc/ilsp-fc-2.2-jar-with-dependencies.jar</pre>
you should do the following have a hunalign dir
<pre>
cd ~/ilsp-fc/
wget ftp://ftp.mokk.bme.hu/Hunglish/src/hunalign/latest/hunalign-1.2.tgz
tar xvfz hunalign-1.2.tgz
cd hunalign-1.2/src/hunalign/
make
ln -sf hunalign-1.2 hunalign
</pre> <pre>~/ilsp-fc/hunalign-1.1/</pre>
This should create hunalign/src/hunalign/hunalign with the suggested hunalign directory structure, including
<pre>~/ilsp-fc/hunalign/data/ <pre>~/ilsp-fc/hunalign-1.1/dict/
~/ilsp-fc/hunalign/src/hunalign/hunalign</pre> ~/ilsp-fc/hunalign-1.1/linux/src/hunalign/hunalign</pre>
Now, you are ready to produce TMX files from bilingual crawled data using the <code>-align</code>, <code>-dict</code>, <code>-oft</code> and <code>-ofth</code> options described in the [[GettingStarted|Getting Started]] part of the documentation.