Getting Started » History » Version 23
Prokopis Prokopidis, 2012-10-26 11:22 AM
1 | 1 | Prokopis Prokopidis | h1. Getting Started |
---|---|---|---|
2 | 2 | Prokopis Prokopidis | |
3 | 2 | Prokopis Prokopidis | Once you [[DeveloperSetup|build]] or [[HowToGet|download]] an ilsp-fc runnable jar, you can run it like this |
4 | 2 | Prokopis Prokopidis | |
5 | 11 | Prokopis Prokopidis | <pre><code>java -jar ilsp-fc-1.1-jar-with-dependencies.jar</code></pre> |
6 | 2 | Prokopis Prokopidis | |
7 | 23 | Prokopis Prokopidis | There are several settings that influence the crawling process and can be defined in the configuration file (the default file is [[crawler_config.xml]]) before the crawling process. Some of them can also be set in the command running the ilsp-fc runnable jar, as follows: |
8 | 15 | Vassilis Papavassiliou | |
9 | 22 | Prokopis Prokopidis | <pre><code>-a :user agent name |
10 | 18 | Vassilis Papavassiliou | -c :the crawl duration in minutes. Since the crawler runs in cycles (during which links stored at the top of the crawler’s frontier are extracted and new links are examined) it is very likely that the defined time will expire during a cycle run. Then, the crawler will stop only after the end of the running cycle. The default value is 10 minutes. |
11 | 20 | Vassilis Papavassiliou | -n :the crawl duration in cycles. |
12 | 20 | Vassilis Papavassiliou | -t :the number of threads that will be used to fetch web pages in parallel. |
13 | 20 | Vassilis Papavassiliou | -type : the type of crawling. Crawling for monolingual (m) or parallel (p). |
14 | 1 | Prokopis Prokopidis | - |
15 | 22 | Prokopis Prokopidis | </code></pre> |
16 | 1 | Prokopis Prokopidis | |
17 | 1 | Prokopis Prokopidis | h2. Run a monolingual crawl |
18 | 1 | Prokopis Prokopidis | |
19 | 22 | Prokopis Prokopidis | <pre><code>java -jar ilsp-fc-1.1-jar-with-dependencies.jar crawlandexport -a vpapa@ilsp.gr \ |
20 | 22 | Prokopis Prokopidis | -cfg FMC_config.xml -t 10 -type m -c 10 -lang de -of output_test1_list.txt \ |
21 | 22 | Prokopis Prokopidis | -ofh output_test1_list.txt.html -tc Automotive-seed-terms-de.txt \ |
22 | 22 | Prokopis Prokopidis | -u Automotive-seed-urls.txt -xslt -f -k</code></pre> |
23 | 2 | Prokopis Prokopidis | |
24 | 1 | Prokopis Prokopidis | h2. Run a bilingual crawl |
25 | 12 | Vassilis Papavassiliou | |
26 | 14 | Vassilis Papavassiliou | <pre><code>java -jar ilsp-fc-1.1-jar-with-dependencies.jar crawlandexport -a test1 -c 10 -f -k -l1 de -l2 it -t 10 -of test_HS_DE-IT_output.txt -ofh test_HS_DE-IT_output.txt.html -tc HS_DE-IT_topic.txt -type p -xslt -u seed_suva.txt -cfg FBC_config.xml</code></pre> |
27 | 12 | Vassilis Papavassiliou | |
28 | 2 | Prokopis Prokopidis | |
29 | 2 | Prokopis Prokopidis | h2. Example of java code |
30 | 9 | Prokopis Prokopidis | |
31 | 2 | Prokopis Prokopidis | <pre> |
32 | 2 | Prokopis Prokopidis | <code class="java"> |
33 | 2 | Prokopis Prokopidis | package gr.ilsp.fmc.classifier; |
34 | 2 | Prokopis Prokopidis | |
35 | 2 | Prokopis Prokopidis | public enum ClassifierCounters { |
36 | 2 | Prokopis Prokopidis | CLASSIFIER_DOCUMENTS_PASSED, // successfully classified a document. |
37 | 1 | Prokopis Prokopidis | CLASSIFIER_DOCUMENTS_FAILED, // failed to classify a document |
38 | 2 | Prokopis Prokopidis | CLASSIFIER_DOCUMENTS_ABORTED, |
39 | 2 | Prokopis Prokopidis | CLASSIFIER_TIME |
40 | 8 | Prokopis Prokopidis | }</code></pre> |
41 | 2 | Prokopis Prokopidis | |
42 | 2 | Prokopis Prokopidis | <pre> |
43 | 1 | Prokopis Prokopidis | <code class="xml"> |
44 | 2 | Prokopis Prokopidis | <?xml version="1.0" encoding="UTF-8"?> |
45 | 2 | Prokopis Prokopidis | <configuration> |
46 | 2 | Prokopis Prokopidis | <agent> |
47 | 2 | Prokopis Prokopidis | <email>yourmail@mail.com</email> |
48 | 2 | Prokopis Prokopidis | <web_address>www.youraddress.com</web_address> |
49 | 2 | Prokopis Prokopidis | </agent> |
50 | 8 | Prokopis Prokopidis | </configuration></code></pre> |