crawler.overall_crawl_timeout

Specifies the maximum time the crawler is allowed to run. When exceeded, the crawl will stop and the update will continue.

Key: crawler.overall_crawl_timeout
Type: Integer
Can be set in: collection.cfg

Description

This option specifies how many minutes, or hours, the crawler is allowed to run. The option crawler.overall_crawl_units specifies the unit of measure.

This parameter can be left empty, in which case the crawler will keep going until there are no URLs left in its frontier, or it reaches another limit e.g. crawler.max_files_stored

Default Value

crawler.overall_crawl_timeout=24

Examples

If you are testing a new web data source, then it can be useful to run a short crawl, say 10 minutes:

crawler.overall_crawl_timeout=10
crawler.overall_crawl_units=min