Project

General

Profile

Getting Started » History » Version 25

Vassilis Papavassiliou, 2012-10-26 03:58 PM

1 1 Prokopis Prokopidis
h1. Getting Started
2 2 Prokopis Prokopidis
3 2 Prokopis Prokopidis
Once you [[DeveloperSetup|build]] or [[HowToGet|download]] an ilsp-fc runnable jar, you can run it like this
4 2 Prokopis Prokopidis
5 11 Prokopis Prokopidis
<pre><code>java -jar ilsp-fc-1.1-jar-with-dependencies.jar</code></pre>
6 2 Prokopis Prokopidis
7 23 Prokopis Prokopidis
There are several settings that influence the crawling process and can be defined in the configuration file (the default file is [[crawler_config.xml]]) before the crawling process. Some of them can also be set in the command running the ilsp-fc runnable jar, as follows:
8 15 Vassilis Papavassiliou
9 22 Prokopis Prokopidis
<pre><code>-a :user agent name 
10 24 Vassilis Papavassiliou
-c :the crawl duration in minutes. Since the crawler runs in cycles
11 24 Vassilis Papavassiliou
    (during which links stored at the top of the crawler’s frontier 
12 24 Vassilis Papavassiliou
    are extracted and new links are examined) it is very likely that
13 24 Vassilis Papavassiliou
    the defined time will expire during a cycle run. Then, the crawler
14 24 Vassilis Papavassiliou
    will stop only after the end of the running cycle.
15 24 Vassilis Papavassiliou
    The default value is 10 minutes.
16 20 Vassilis Papavassiliou
-n :the crawl duration in cycles. 
17 20 Vassilis Papavassiliou
-t :the number of threads that will be used to fetch web pages in parallel.
18 20 Vassilis Papavassiliou
-type : the type of crawling. Crawling for monolingual (m) or parallel (p).
19 25 Vassilis Papavassiliou
-lang : the targeted language in case of monolingual crawling.
20 25 Vassilis Papavassiliou
-l1 :  the first targeted language in case of bilingual crawling.
21 25 Vassilis Papavassiliou
-l2 :  the second targeted language in case of bilingual crawling.
22 22 Prokopis Prokopidis
</code></pre>
23 1 Prokopis Prokopidis
24 1 Prokopis Prokopidis
h2. Run a monolingual crawl
25 1 Prokopis Prokopidis
26 22 Prokopis Prokopidis
<pre><code>java -jar ilsp-fc-1.1-jar-with-dependencies.jar crawlandexport -a vpapa@ilsp.gr \
27 22 Prokopis Prokopidis
                -cfg FMC_config.xml -t 10 -type m -c 10 -lang de -of output_test1_list.txt \
28 22 Prokopis Prokopidis
                -ofh  output_test1_list.txt.html -tc Automotive-seed-terms-de.txt  \
29 22 Prokopis Prokopidis
                 -u  Automotive-seed-urls.txt -xslt -f -k</code></pre>
30 2 Prokopis Prokopidis
31 1 Prokopis Prokopidis
h2. Run a bilingual crawl
32 12 Vassilis Papavassiliou
33 14 Vassilis Papavassiliou
<pre><code>java -jar ilsp-fc-1.1-jar-with-dependencies.jar crawlandexport -a test1 -c 10 -f -k -l1 de -l2 it -t 10 -of test_HS_DE-IT_output.txt -ofh  test_HS_DE-IT_output.txt.html -tc HS_DE-IT_topic.txt -type p -xslt -u  seed_suva.txt -cfg FBC_config.xml</code></pre>
34 12 Vassilis Papavassiliou
35 2 Prokopis Prokopidis
36 2 Prokopis Prokopidis
h2. Example of java code
37 9 Prokopis Prokopidis
38 2 Prokopis Prokopidis
<pre>
39 2 Prokopis Prokopidis
<code class="java">
40 2 Prokopis Prokopidis
package gr.ilsp.fmc.classifier;
41 2 Prokopis Prokopidis
42 2 Prokopis Prokopidis
public enum ClassifierCounters {
43 2 Prokopis Prokopidis
    CLASSIFIER_DOCUMENTS_PASSED,   // successfully classified a document.
44 1 Prokopis Prokopidis
    CLASSIFIER_DOCUMENTS_FAILED,   // failed to classify a document
45 2 Prokopis Prokopidis
    CLASSIFIER_DOCUMENTS_ABORTED, 
46 2 Prokopis Prokopidis
    CLASSIFIER_TIME
47 8 Prokopis Prokopidis
}</code></pre>
48 2 Prokopis Prokopidis
49 2 Prokopis Prokopidis
<pre>
50 1 Prokopis Prokopidis
<code class="xml">
51 2 Prokopis Prokopidis
<?xml version="1.0" encoding="UTF-8"?>
52 2 Prokopis Prokopidis
<configuration>
53 2 Prokopis Prokopidis
        <agent>
54 2 Prokopis Prokopidis
                <email>yourmail@mail.com</email>
55 2 Prokopis Prokopidis
                <web_address>www.youraddress.com</web_address>
56 2 Prokopis Prokopidis
        </agent>
57 8 Prokopis Prokopidis
</configuration></code></pre>