Crawler Monitor Url Reject List (collection.cfg)

Description

This parameter can be modified during a running crawl to tell the crawler to ignore the specified list of URLs for the remainder of the crawl. Normally if you know before a crawl what areas to avoid you would add them to the exclude_patterns parameter. The format to use is a comma separated list of URLs.

Matching URLs gathered prior to this configuration change will not be affected.

NB: The pattern must include a protocol/schema e.g. http://www.example.com/ and not www.example.com

Default value

(Empty)

Examples

Reject any URLs from the given sites or sub-sites during a running crawl:

crawler.monitor_url_reject_list=http://abc.com/,http://d.com/site/

top