Project

General

Profile

Getting Started » History » Version 16

Version 15 (Vassilis Papavassiliou, 2012-10-25 02:18 PM) → Version 16/167 (Vassilis Papavassiliou, 2012-10-25 02:25 PM)

h1. Getting Started

Once you [[DeveloperSetup|build]] or [[HowToGet|download]] an ilsp-fc runnable jar, you can run it like this

<pre><code>java -jar ilsp-fc-1.1-jar-with-dependencies.jar</code></pre>

There are several settings that influence the crawling process and can be defined in the configuration file (the default file is crawler_config.xml) before the crawling process. Some of them can also be set in the command running the ilsp-fc runnable jar, as follows:

-a user agent name



h2. Run a monolingual crawl

<pre><code>java -jar ilsp-fc-1.1-jar-with-dependencies.jar crawlandexport -a vpapa@ilsp.gr test1 -c 10 -f -k -lang de -t 10 -of output_test1_list.txt -ofh output_test1_list.txt.html -tc Automotive-seed-terms-de.txt -u Automotive-seed-urls.txt -type m -xslt -cfg FMC_config.xml</code></pre>

h2. Run a bilingual crawl

<pre><code>java -jar ilsp-fc-1.1-jar-with-dependencies.jar crawlandexport -a test1 -c 10 -f -k -l1 de -l2 it -t 10 -of test_HS_DE-IT_output.txt -ofh test_HS_DE-IT_output.txt.html -tc HS_DE-IT_topic.txt -type p -xslt -u seed_suva.txt -cfg FBC_config.xml</code></pre>

h2. Example of java code

<pre>
<code class="java">
package gr.ilsp.fmc.classifier;

public enum ClassifierCounters {
CLASSIFIER_DOCUMENTS_PASSED, // successfully classified a document.
CLASSIFIER_DOCUMENTS_FAILED, // failed to classify a document
CLASSIFIER_DOCUMENTS_ABORTED,
CLASSIFIER_TIME
}</code></pre>

<pre>
<code class="xml">
<?xml version="1.0" encoding="UTF-8"?>
<configuration>
<agent>
<email>yourmail@mail.com</email>
<web_address>www.youraddress.com</web_address>
</agent>
</configuration></code></pre>