Project

General

Profile

TMX merging » History » Version 2

Vassilis Papavassiliou, 2016-02-16 07:55 PM

1 1 Prokopis Prokopidis
# TMX merging
2 1 Prokopis Prokopidis
3 1 Prokopis Prokopidis
It merges generated TMX and creates the final TMX which is considered as the final output (i.e. the bilingual corpus). Filtering of segment pairs is supported since targeted types of document pairs and segment can be selected. It also extracts metadata of the final corpus.  
4 1 Prokopis Prokopidis
5 1 Prokopis Prokopidis
```
6 1 Prokopis Prokopidis
java -Dlog4j.configuration=file:/opt/ilsp-fc/log4j.xml -jar /opt/ilsp-fc/ilsp-fc-2.2.2-jar-with-dependencies.jar \
7 1 Prokopis Prokopidis
-tmxmerge -lang "L1;L2" -oxslt -doctypes "aupdih" -segtypes "1:1" \
8 1 Prokopis Prokopidis
-tmx (fullpath of the merged TMX to be constructed) \
9 1 Prokopis Prokopidis
&>"log-tmxmerge"
10 1 Prokopis Prokopidis
```
11 1 Prokopis Prokopidis
12 1 Prokopis Prokopidis
-tmxmerge	:	for merging generated TMX files (i.e. construct a bilingual corpus).
13 1 Prokopis Prokopidis
14 1 Prokopis Prokopidis
-doctypes	:	Defines the types of the document pairs from which the segment pairs will be selected. The proposed value is "aupidh"	since pairs of type "m" and "l" (e.g. eng-1_lav-3_m.xml or eng-2_lav-8_l.xml) are only used for testing or examining the tool.
15 1 Prokopis Prokopidis
16 1 Prokopis Prokopidis
-thres  : thresholds for 0:1 alignments per type. It should be of the same length with the types parameter. If a TMX of type X contains more 0:1 segment pairs than the corresponding threshold, it will not be selected
17 1 Prokopis Prokopidis
18 1 Prokopis Prokopidis
-segtypes	:	Types of segment alignments that will be selected for the final output. The value "1:1" (deault) is proposed. If omitted, segments of all types will be processed. "Otherwise put segment types seperated by ; (i.e. 1:1;1:2;2:1)
19 1 Prokopis Prokopidis
20 1 Prokopis Prokopidis
-tmx		:	A TMX files that includes filtered segment pairs of the generated TMX. This is the final output of the process		(i.e. the parallel corpus)
21 1 Prokopis Prokopidis
22 2 Vassilis Papavassiliou
-cc    : If exists, only document pairs for which a license has been detected will be selected in merged TMX.
23 1 Prokopis Prokopidis
24 1 Prokopis Prokopidis
-metadata    : Generates an XML file which contains metadata of the generated corpus.
25 1 Prokopis Prokopidis
26 1 Prokopis Prokopidis
-cfg : The full path to a configuration file that can be used to override default parameters.