Resources » History » Version 2
Prokopis Prokopidis, 2015-08-20 01:54 PM
1 | 1 | Prokopis Prokopidis | h1. Domain-specific resources acquired with ILSP-FC |
---|---|---|---|
2 | 1 | Prokopis Prokopidis | |
3 | 1 | Prokopis Prokopidis | ILSP-FC[1] has been used in order to acquire several domain-specific datasets for training and evaluating domain-specific SMT systems. These datasets include |
4 | 1 | Prokopis Prokopidis | |
5 | 1 | Prokopis Prokopidis | * bilingual corpora in EN-EL, EN-FR, EN-IT and EN-ES (for the environment and labor legislation domains) that were then used by the PANACEA consortium for domain adaptation SMT experiments [2] and the generation of domain specific bilingual glossaries; monolingual corpora in the same languages and domains, used for the creation of domain-specific ngram lists. |
6 | 1 | Prokopis Prokopidis | * all combinations of DE, EL, EN, PT for the automotive and medical domains in QTLaunchPad |
7 | 1 | Prokopis Prokopidis | * EN-HR bilingual corpora for the tourist domain [3]; EN-FI bilingual corpora used for the Abu-MaTran project submissions in WMT 2015 [4]; |
8 | 1 | Prokopis Prokopidis | |
9 | 1 | Prokopis Prokopidis | Additionally, experiments involving crawling public administration websites for the purposes of ELRC have generated bilingual collections in several language pairs, some examples of which are available at the following links: EN-DE; EN-LV; EN-GA. |
10 | 1 | Prokopis Prokopidis | |
11 | 1 | Prokopis Prokopidis | h2. References |
12 | 2 | Prokopis Prokopidis | |
13 | 2 | Prokopis Prokopidis | [1] V. Papavassiliou, P. Prokopidis, G. Thurmair. A modular open-source focused crawler for mining monolingual and bilingual corpora from the web. In the 6th Workshop on Building and Using Comparable Corpora. 2013. |
14 | 2 | Prokopis Prokopidis | |
15 | 1 | Prokopis Prokopidis | [2] P. Pecina, A. Toral, V. Papavassiliou, P. Prokopidis, A. Tamchyna, A. Way, J.V. Genabith. Domain adaptation of statistical machine translation with domain-focused web crawling. Language Resources and Evaluation. Vol. 49:1. 2015. |
16 | 2 | Prokopis Prokopidis | |
17 | 1 | Prokopis Prokopidis | [3] M. Esplà-Gomis, F. Klubička, N. Ljubešić, S. Ortiz-Rojas, V. Papavassiliou, P. Prokopidis. Comparing two acquisition systems for automatically building an English-Croatian parallel corpus from multilingual websites. In LREC 2014. |
18 | 2 | Prokopis Prokopidis | |
19 | 1 | Prokopis Prokopidis | [4] R. Rubino, T. Pirineny, M. Esplà-Gomis, N. Ljubešić, S. Ortiz-Rojas, V. Papavassiliou, P. Prokopidis, A. Toral. Abu-MaTran at WMT 2015 Translation Task: Morphological Segmentation and Web Crawling. In WMT2015 (to appear) |