Crawler config » History » Version 1
Version 1/4
-
Next » -
Current version
Prokopis Prokopidis, 2012-10-26 11:24 AM
<?xml version="1.0" encoding="UTF-8"?>
yourmail@mail.com
www.youraddress.com
2
Minimum number of terms that must exist in clean
content of each web page in order to be stored.
2
Minimum unique terms that must exist in clean content
10
Maximum depth to crawl before abandoning a specific path. Depth
is increased every time a link is extracted from a non-relevant web page.
Max number of urls to fetch per run
512
10000
Socket timeout in milliseconds(per URL)
10000
Connection timeout in milliseconds(per URL)
2
Max number of attempts to fetch a Web page before giving up
0
Min bytes-per-seconds for fetching a web page
Accepted mime types
1500
delay in milliseconds between requests
531072
Max content size (bytes) for downloading a web page
512
Max fetch set size per run (Sets are made by URLs from the same host)
512
Max URLs from a specific host per run
32
Max number of fetching threads for each host
500000
Max web pages to fetch per host
5
Max number of redirects
600000
Max time to wait for Fetcher to get all URLs in a run