Project

General

Profile

Sentence Alignment Setup » History » Version 11

Prokopis Prokopidis, 2016-02-12 03:30 PM

1 8 Prokopis Prokopidis
# Sentence Alignment Setup 
2 1 Prokopis Prokopidis
3 11 Prokopis Prokopidis
For the generation of sentence alignments from bilingual crawls, ILSP-FC integrates the java sentence aligner provided at https://github.com/loomchild/maligna/. 
4 1 Prokopis Prokopidis
5 11 Prokopis Prokopidis
6 11 Prokopis Prokopidis
Alternatively you can use an external aligner like hunalign. For example, for the current version of ILSP-FC, you can   
7 6 Prokopis Prokopidis
8 6 Prokopis Prokopidis
* download the hunalign-1.2 source code from http://mokk.bme.hu/en/resources/hunalign/ 
9 1 Prokopis Prokopidis
* follow the instructions on the hunalign page for building hunalign
10 1 Prokopis Prokopidis
* put the hunalign directory containing the hunalign executable next to the runnable ilsp-fc jar. 
11 1 Prokopis Prokopidis
12 1 Prokopis Prokopidis
For example, if you run ilsp-fc from:
13 8 Prokopis Prokopidis
```
14 10 Prokopis Prokopidis
~/ilsp-fc/ilsp-fc-x.x-jar-with-dependencies.jar
15 8 Prokopis Prokopidis
```
16 1 Prokopis Prokopidis
17 6 Prokopis Prokopidis
you should do the following 
18 1 Prokopis Prokopidis
19 8 Prokopis Prokopidis
```
20 6 Prokopis Prokopidis
cd ~/ilsp-fc/
21 6 Prokopis Prokopidis
wget ftp://ftp.mokk.bme.hu/Hunglish/src/hunalign/latest/hunalign-1.2.tgz
22 1 Prokopis Prokopidis
tar xvfz hunalign-1.2.tgz
23 6 Prokopis Prokopidis
cd hunalign-1.2/src/hunalign/
24 1 Prokopis Prokopidis
make
25 1 Prokopis Prokopidis
ln -sf hunalign-1.2 hunalign
26 8 Prokopis Prokopidis
```
27 1 Prokopis Prokopidis
28 6 Prokopis Prokopidis
This should create hunalign/src/hunalign/hunalign with the suggested hunalign directory structure, including
29 1 Prokopis Prokopidis
30 8 Prokopis Prokopidis
```
31 8 Prokopis Prokopidis
~/ilsp-fc/hunalign/data/ 
32 8 Prokopis Prokopidis
~/ilsp-fc/hunalign/src/hunalign/hunalign
33 8 Prokopis Prokopidis
```
34 4 Prokopis Prokopidis
35 1 Prokopis Prokopidis
Now, you are ready to produce TMX files from bilingual crawled data using the <code>-align</code>, <code>-dict</code>, <code>-oft</code> and <code>-ofth</code> options described  in the [[GettingStarted|Getting Started]] part of the documentation.