crawler.max_dir_depth
Specifies the maximum number of sub-directories a URL may have before it will be ignored.
Key: crawler.max_dir_depth
Type: Integer
Can be set in: collection.cfg
Description
This option sets the limit for the number of directories in a valid URL. The crawler will ignore all URLs that have more than this number of directories. Typically if there are too many directories, it is likely to be a crawler trap, so this limit should not be set too high.
this limit is not checked for dynamic URLs, e.g. ones containing a '?'. |