crawler.max_link_distance

Background

This option configures the crawler to follow links a specific "distance" from the start URL(s).

If this option is set, then the crawler will run in single-threaded mode (i.e. only one web connection will be made at a time) to control which URLs are processed. This will have an impact on performance i.e. slower crawl.

Setting the key

Set this configuration key in the search package or data source configuration.

Use the configuration key editor to add or edit the crawler.max_link_distance key, and set the value. This can be set to any valid Integer value.

Default value

If not defined, it is unlimited.

Examples

Limit the crawler to the URLs linked to from all URLs listed in the crawler.start_urls_file (set 1):

crawler.max_link_distance=1

Limit to all pages linked to from set 1:

crawler.max_link_distance=2

Notes:

  • A distance of zero (0) will limit the crawl to just the start_url or all the URLs listed in the crawler.start_urls_file.

  • A redirect target is considered at the same distance as the original URL which generated the redirect.