Sentence Alignment Setup » History » Version 11
Prokopis Prokopidis, 2016-02-12 03:30 PM
1 | 8 | Prokopis Prokopidis | # Sentence Alignment Setup |
---|---|---|---|
2 | 1 | Prokopis Prokopidis | |
3 | 11 | Prokopis Prokopidis | For the generation of sentence alignments from bilingual crawls, ILSP-FC integrates the java sentence aligner provided at https://github.com/loomchild/maligna/. |
4 | 1 | Prokopis Prokopidis | |
5 | 11 | Prokopis Prokopidis | |
6 | 11 | Prokopis Prokopidis | Alternatively you can use an external aligner like hunalign. For example, for the current version of ILSP-FC, you can |
7 | 6 | Prokopis Prokopidis | |
8 | 6 | Prokopis Prokopidis | * download the hunalign-1.2 source code from http://mokk.bme.hu/en/resources/hunalign/ |
9 | 1 | Prokopis Prokopidis | * follow the instructions on the hunalign page for building hunalign |
10 | 1 | Prokopis Prokopidis | * put the hunalign directory containing the hunalign executable next to the runnable ilsp-fc jar. |
11 | 1 | Prokopis Prokopidis | |
12 | 1 | Prokopis Prokopidis | For example, if you run ilsp-fc from: |
13 | 8 | Prokopis Prokopidis | ``` |
14 | 10 | Prokopis Prokopidis | ~/ilsp-fc/ilsp-fc-x.x-jar-with-dependencies.jar |
15 | 8 | Prokopis Prokopidis | ``` |
16 | 1 | Prokopis Prokopidis | |
17 | 6 | Prokopis Prokopidis | you should do the following |
18 | 1 | Prokopis Prokopidis | |
19 | 8 | Prokopis Prokopidis | ``` |
20 | 6 | Prokopis Prokopidis | cd ~/ilsp-fc/ |
21 | 6 | Prokopis Prokopidis | wget ftp://ftp.mokk.bme.hu/Hunglish/src/hunalign/latest/hunalign-1.2.tgz |
22 | 1 | Prokopis Prokopidis | tar xvfz hunalign-1.2.tgz |
23 | 6 | Prokopis Prokopidis | cd hunalign-1.2/src/hunalign/ |
24 | 1 | Prokopis Prokopidis | make |
25 | 1 | Prokopis Prokopidis | ln -sf hunalign-1.2 hunalign |
26 | 8 | Prokopis Prokopidis | ``` |
27 | 1 | Prokopis Prokopidis | |
28 | 6 | Prokopis Prokopidis | This should create hunalign/src/hunalign/hunalign with the suggested hunalign directory structure, including |
29 | 1 | Prokopis Prokopidis | |
30 | 8 | Prokopis Prokopidis | ``` |
31 | 8 | Prokopis Prokopidis | ~/ilsp-fc/hunalign/data/ |
32 | 8 | Prokopis Prokopidis | ~/ilsp-fc/hunalign/src/hunalign/hunalign |
33 | 8 | Prokopis Prokopidis | ``` |
34 | 4 | Prokopis Prokopidis | |
35 | 1 | Prokopis Prokopidis | Now, you are ready to produce TMX files from bilingual crawled data using the <code>-align</code>, <code>-dict</code>, <code>-oft</code> and <code>-ofth</code> options described in the [[GettingStarted|Getting Started]] part of the documentation. |