crawler.max_url_repeating_elements

Background

The crawler will ignore all URLs that contain more than this number of repeating elements. For example, the following url:

http://example.com/a/a/a/a/a/a/

will be ignored if the default limit of 5 is being used, as it has 6 repeating "a" elements or directories. This check is used to guard against crawler traps and badly configured web servers.

Setting the key

Set this configuration key in the search package or data source configuration.

Use the configuration key editor to add or edit the crawler.max_url_repeating_elements key, and set the value. This can be set to any valid Integer value.

Default value

crawler.max_url_repeating_elements=5