crawler.monitor_preferred_servers_list
Background
This parameter can be used to specify an optional list of servers to prefer during a crawl. If the webcrawler is crawling a lot of different webservers and there are a number ready to contact then any server on this list will be preferred and contacted first. This will have the effect of getting more content from this server compared to other non-preferred servers.
In addition, any server on this list will be given a larger crawler.max_files_per_area
limit, which increases the number of files which can be fetched from each static directory or generator.
Normally, this parameter should not need to be used. A scenario where it would be useful is a time-limited crawl of a large domain with lots of web servers, where it’s important to get as much content as possible from some key servers.
Examples
Specify some preferred servers:
crawler.monitor_preferred_servers_list=http://site.com/,http://example.com/
The protocol (e.g. http or https) is required in the server URL specification. |