crawler.monitor_preferred_servers_list

Background

This parameter can be used to specify an optional list of servers to prefer during a crawl. If the webcrawler is crawling a lot of different webservers and there are a number ready to contact then any server on this list will be preferred and contacted first. This will have the effect of getting more content from this server compared to other non-preferred servers.

In addition, any server on this list will be given a larger crawler.max_files_per_area limit, which increases the number of files which can be fetched from each static directory or generator.

Normally, this parameter should not need to be used. A scenario where it would be useful is a time-limited crawl of a large domain with lots of web servers, where it’s important to get as much content as possible from some key servers.

Setting the key

Set this configuration key in the search package or data source configuration.

Use the configuration key editor to add or edit the crawler.monitor_preferred_servers_list key, and set the value. This can be set to any valid String value.

Default value

(Empty)

Examples

Specify some preferred servers:

crawler.monitor_preferred_servers_list=http://site.com/,http://example.com/

The protocol (e.g. http or https) is required in the server URL specification.

Help Center

Menu

crawler.monitor_preferred_servers_list

Background

Setting the key

Default value

Examples

See also