crawler.max_dir_depth

Background

This option sets the limit for the number of directories in a valid URL. The crawler will ignore all URLs that have more than this number of directories. Typically, if there are too many directories, it is likely to be a crawler trap, so this limit should not be set too high.

this limit is not checked for dynamic URLs, e.g. ones containing a '?'.

Setting the key

Set this configuration key in the search package or data source configuration.

Use the configuration key editor to add or edit the crawler.max_dir_depth key, and set the value. This can be set to any valid Integer value.

Default value

crawler.max_dir_depth=15

Examples

crawler.max_dir_depth=2

Will have the following effect:

http://host/one/two/ok
http://host/one/two/three/fails