Project

General

Profile

Getting Started » History » Version 17

Vassilis Papavassiliou, 2012-10-25 02:38 PM

1 1 Prokopis Prokopidis
h1. Getting Started
2 2 Prokopis Prokopidis
3 2 Prokopis Prokopidis
Once you [[DeveloperSetup|build]] or [[HowToGet|download]] an ilsp-fc runnable jar, you can run it like this
4 2 Prokopis Prokopidis
5 11 Prokopis Prokopidis
<pre><code>java -jar ilsp-fc-1.1-jar-with-dependencies.jar</code></pre>
6 2 Prokopis Prokopidis
7 15 Vassilis Papavassiliou
There are several settings that influence the crawling process and can be defined in the configuration file (the default file is crawler_config.xml) before the crawling process. Some of them can also be set in the command running the ilsp-fc runnable jar, as follows:
8 15 Vassilis Papavassiliou
9 17 Vassilis Papavassiliou
-a :user agent name 
10 17 Vassilis Papavassiliou
-t :the crawl duration in minutes. Since the crawler runs in cycles (during which links stored at the top of the crawler’s frontier are extracted and new links are examined) it is very likely that the defined time will expire during a cycle run. Then, the crawler will stop only after the end of the running cycle. The default value is 10 minutes.
11 17 Vassilis Papavassiliou
-n :the crawl duration in cycles. 
12 2 Prokopis Prokopidis
13 1 Prokopis Prokopidis
h2. Run a monolingual crawl
14 1 Prokopis Prokopidis
15 17 Vassilis Papavassiliou
<pre><code>java -jar ilsp-fc-1.1-jar-with-dependencies.jar crawlandexport -a vpapa@ilsp.gr -cfg FMC_config.xml -t 10 -type m -c 10 -lang de -of output_test1_list.txt -ofh  output_test1_list.txt.html -tc Automotive-seed-terms-de.txt  -u  Automotive-seed-urls.txt -xslt -f -k</code></pre>
16 2 Prokopis Prokopidis
17 1 Prokopis Prokopidis
h2. Run a bilingual crawl
18 12 Vassilis Papavassiliou
19 14 Vassilis Papavassiliou
<pre><code>java -jar ilsp-fc-1.1-jar-with-dependencies.jar crawlandexport -a test1 -c 10 -f -k -l1 de -l2 it -t 10 -of test_HS_DE-IT_output.txt -ofh  test_HS_DE-IT_output.txt.html -tc HS_DE-IT_topic.txt -type p -xslt -u  seed_suva.txt -cfg FBC_config.xml</code></pre>
20 12 Vassilis Papavassiliou
21 2 Prokopis Prokopidis
22 2 Prokopis Prokopidis
h2. Example of java code
23 9 Prokopis Prokopidis
24 2 Prokopis Prokopidis
<pre>
25 2 Prokopis Prokopidis
<code class="java">
26 2 Prokopis Prokopidis
package gr.ilsp.fmc.classifier;
27 2 Prokopis Prokopidis
28 2 Prokopis Prokopidis
public enum ClassifierCounters {
29 2 Prokopis Prokopidis
    CLASSIFIER_DOCUMENTS_PASSED,   // successfully classified a document.
30 1 Prokopis Prokopidis
    CLASSIFIER_DOCUMENTS_FAILED,   // failed to classify a document
31 2 Prokopis Prokopidis
    CLASSIFIER_DOCUMENTS_ABORTED, 
32 2 Prokopis Prokopidis
    CLASSIFIER_TIME
33 8 Prokopis Prokopidis
}</code></pre>
34 2 Prokopis Prokopidis
35 2 Prokopis Prokopidis
<pre>
36 1 Prokopis Prokopidis
<code class="xml">
37 2 Prokopis Prokopidis
<?xml version="1.0" encoding="UTF-8"?>
38 2 Prokopis Prokopidis
<configuration>
39 2 Prokopis Prokopidis
        <agent>
40 2 Prokopis Prokopidis
                <email>yourmail@mail.com</email>
41 2 Prokopis Prokopidis
                <web_address>www.youraddress.com</web_address>
42 2 Prokopis Prokopidis
        </agent>
43 8 Prokopis Prokopidis
</configuration></code></pre>