Project

General

Profile

TMX merging » History » Version 10

Vassilis Papavassiliou, 2021-05-07 03:36 PM

1 1 Prokopis Prokopidis
# TMX merging
2 1 Prokopis Prokopidis
3 1 Prokopis Prokopidis
It merges generated TMX and creates the final TMX which is considered as the final output (i.e. the bilingual corpus). Filtering of segment pairs is supported since targeted types of document pairs and segment can be selected. It also extracts metadata of the final corpus.  
4 1 Prokopis Prokopidis
5 1 Prokopis Prokopidis
```
6 10 Vassilis Papavassiliou
java -Dlog4j.configuration=file:/opt/ilsp-fc/log4j.xml -jar /opt/ilsp-fc/ilsp-fc-2.2.4-SNAPSHOT-jar-with-dependencies.jar -tmxmerge -lang "L1;L2" \ 
7 5 Vassilis Papavassiliou
-i (input) -oxslt -pdm "aupdih" -segtypes "1:1" -bs (baseName for output files) &>"log-tmxmerge"
8 1 Prokopis Prokopidis
```
9 1 Prokopis Prokopidis
10 4 Vassilis Papavassiliou
## Options
11 4 Vassilis Papavassiliou
12 4 Vassilis Papavassiliou
13 4 Vassilis Papavassiliou
```
14 8 Vassilis Papavassiliou
-tmxmerge   :     for merging generated TMX files (i.e. construct a bilingual corpus).
15 1 Prokopis Prokopidis
16 8 Vassilis Papavassiliou
-i          :     fullpath of input file/directory. It could be either a directory which contains the TMX files to be merged,
17 8 Vassilis Papavassiliou
                  or a text file with fullpaths of such directories (one directory per textline)
18 6 Vassilis Papavassiliou
19 8 Vassilis Papavassiliou
-pdm        :     Defines the types of the document pairs from which the segment pairs will be selected.
20 8 Vassilis Papavassiliou
                  The proposed value is "aupidh" since pairs of type "m" and "l" (e.g. eng-1_lav-3_m.xml or eng-2_lav-8_l.xml)
21 8 Vassilis Papavassiliou
                  are only used for testing or examining the tool.
22 1 Prokopis Prokopidis
23 8 Vassilis Papavassiliou
-thres      :     thresholds for 0:1 alignments per type. It should be of the same length with the types parameter. If a TMX of type X contains
24 8 Vassilis Papavassiliou
                  more 0:1 segment pairs than the corresponding threshold, it will not be selected
25 1 Prokopis Prokopidis
26 8 Vassilis Papavassiliou
-segtypes   :     Types of segment alignments that will be selected for the final output. A suggested value is "1:1".
27 8 Vassilis Papavassiliou
                  Multiple segment types can be separated by ";" (e.g. 1:1;1:2;2:1).
28 1 Prokopis Prokopidis
29 8 Vassilis Papavassiliou
-oxslt      :     Apply an xsl transformation to generate html file during exporting.
30 1 Prokopis Prokopidis
31 8 Vassilis Papavassiliou
-cc         :     If exists, only document pairs for which a license has been detected will be selected in merged TMX.
32 1 Prokopis Prokopidis
33 8 Vassilis Papavassiliou
-cfg        :     The full path to a configuration file that can be used to override default parameters.
34 1 Prokopis Prokopidis
35 8 Vassilis Papavassiliou
-keepdup    :     keeps duplicate TUs, and annotates them
36 5 Vassilis Papavassiliou
37 8 Vassilis Papavassiliou
-keepem     :     keeps TUs, even if one of its TUV does not contain any letter, and annotates them
38 5 Vassilis Papavassiliou
39 8 Vassilis Papavassiliou
-keepiden   :     keeps TUs, even if its TUVs are identical after removing non-letters, and annotates them
40 5 Vassilis Papavassiliou
41 8 Vassilis Papavassiliou
-ksn        :     keeps only TUs with same digits
42 5 Vassilis Papavassiliou
43 8 Vassilis Papavassiliou
-maxlr      :     maximum ratio of length (in chars) in a TU
44 1 Prokopis Prokopidis
45 6 Vassilis Papavassiliou
-minlr      :     minimum ratio of length (in chars) in a TU
46 5 Vassilis Papavassiliou
47 8 Vassilis Papavassiliou
-mpa        :     minimum percentage of 0:1 alignments in a TMX, to be accepted
48 5 Vassilis Papavassiliou
49 8 Vassilis Papavassiliou
-mtuvl      :     minimum length in tokens of an acceptable TUV
50 6 Vassilis Papavassiliou
51 8 Vassilis Papavassiliou
-iso6393    :     if exists three language codes are used. Otherwise, two-letter language codes are used in the generated TMX files.
52 5 Vassilis Papavassiliou
53 4 Vassilis Papavassiliou
```