Optional parameter listing URLs to reject during a running crawl.
Can be set in: collection.cfg
Table of Contents
This parameter can be modified during a running crawl to tell the crawler to ignore the specified list
of URLs for the remainder of the crawl. Normally if you know before a crawl what areas to avoid you
would add them to the
exclude_patterns parameter. The format to use is a comma
separated list of URLs.
Matching URLs gathered prior to this configuration change will not be affected. Matching URLs that are already in the crawl frontier (the list of known but uncrawled URLs) will not be removed until the URL is processed.
The pattern must include a protocol/schema e.g.
Reject any URLs from the given sites or sub-sites during a running crawl: