This parameter controls what revisit policy the web crawler uses, where revisit means using a network call (HTTP HEAD and/or GET request) when processing a URL.
A revisit policy might look at a URL in the URL store and decide that since it hasn’t changed in the last 5 times we downloaded it we will assume that it hasn’t changed this time and not perform a revisit. Instead we will use a copy from the previous crawl, and avoid any HEAD or GET requests for that URL.
|The revisit policy is only used during incremental crawls.|
Set this configuration key in the search package or data source configuration.
Use the configuration key editor to add or edit the
crawler.classes.RevisitPolicy key, and set the value. This can be set to any valid
Change to use a revisit policy which implements the following:
Initially, everything is crawled
crawler.revisit.num_times_unchanged_thresholdcrawls, the page has never changed, then the page will not be crawled for the next
The URL will then have to be crawled
crawler.revisit.num_times_unchanged_thresholdtimes again without any changes before it will be skipped again.
A full crawl will force everything to be crawled, but the values recorded for revisits skipped and num_times_unchanged will be preserved.