Project

General

Profile

Getting Started » History » Version 16

Vassilis Papavassiliou, 2012-10-25 02:25 PM

1 1 Prokopis Prokopidis
h1. Getting Started
2 2 Prokopis Prokopidis
3 2 Prokopis Prokopidis
Once you [[DeveloperSetup|build]] or [[HowToGet|download]] an ilsp-fc runnable jar, you can run it like this
4 2 Prokopis Prokopidis
5 11 Prokopis Prokopidis
<pre><code>java -jar ilsp-fc-1.1-jar-with-dependencies.jar</code></pre>
6 2 Prokopis Prokopidis
7 15 Vassilis Papavassiliou
There are several settings that influence the crawling process and can be defined in the configuration file (the default file is crawler_config.xml) before the crawling process. Some of them can also be set in the command running the ilsp-fc runnable jar, as follows:
8 15 Vassilis Papavassiliou
9 16 Vassilis Papavassiliou
-a user agent name 
10 14 Vassilis Papavassiliou
11 2 Prokopis Prokopidis
h2. Run a monolingual crawl
12 1 Prokopis Prokopidis
13 16 Vassilis Papavassiliou
<pre><code>java -jar ilsp-fc-1.1-jar-with-dependencies.jar crawlandexport -a vpapa@ilsp.gr -c 10 -f -k -lang de -t 10 -of output_test1_list.txt -ofh  output_test1_list.txt.html -tc Automotive-seed-terms-de.txt  -u  Automotive-seed-urls.txt -type m -xslt -cfg FMC_config.xml</code></pre>
14 2 Prokopis Prokopidis
15 1 Prokopis Prokopidis
h2. Run a bilingual crawl
16 12 Vassilis Papavassiliou
17 14 Vassilis Papavassiliou
<pre><code>java -jar ilsp-fc-1.1-jar-with-dependencies.jar crawlandexport -a test1 -c 10 -f -k -l1 de -l2 it -t 10 -of test_HS_DE-IT_output.txt -ofh  test_HS_DE-IT_output.txt.html -tc HS_DE-IT_topic.txt -type p -xslt -u  seed_suva.txt -cfg FBC_config.xml</code></pre>
18 12 Vassilis Papavassiliou
19 2 Prokopis Prokopidis
20 2 Prokopis Prokopidis
h2. Example of java code
21 9 Prokopis Prokopidis
22 2 Prokopis Prokopidis
<pre>
23 2 Prokopis Prokopidis
<code class="java">
24 2 Prokopis Prokopidis
package gr.ilsp.fmc.classifier;
25 2 Prokopis Prokopidis
26 2 Prokopis Prokopidis
public enum ClassifierCounters {
27 2 Prokopis Prokopidis
    CLASSIFIER_DOCUMENTS_PASSED,   // successfully classified a document.
28 1 Prokopis Prokopidis
    CLASSIFIER_DOCUMENTS_FAILED,   // failed to classify a document
29 2 Prokopis Prokopidis
    CLASSIFIER_DOCUMENTS_ABORTED, 
30 2 Prokopis Prokopidis
    CLASSIFIER_TIME
31 8 Prokopis Prokopidis
}</code></pre>
32 2 Prokopis Prokopidis
33 2 Prokopis Prokopidis
<pre>
34 1 Prokopis Prokopidis
<code class="xml">
35 2 Prokopis Prokopidis
<?xml version="1.0" encoding="UTF-8"?>
36 2 Prokopis Prokopidis
<configuration>
37 2 Prokopis Prokopidis
        <agent>
38 2 Prokopis Prokopidis
                <email>yourmail@mail.com</email>
39 2 Prokopis Prokopidis
                <web_address>www.youraddress.com</web_address>
40 2 Prokopis Prokopidis
        </agent>
41 8 Prokopis Prokopidis
</configuration></code></pre>