crawler.allowed_redirect_pattern

Background

When the crawler is redirected to a URL, it will check it against the include/exclude patterns, to determine whether it should continue processing that URL. Usually if the URL doesn’t match the include/exclude rules, it means the crawler has wandered offsite and shouldn’t proceed any further.

However, some websites use external authentication portals. The purpose of this variable is to allow the crawler to continue processing a URL even though it has been redirected offsite. The contents of the offsite pages won’t be stored, but the crawler will still be allowed to proceed, e.g. for the purposes of authentication / form interaction.

This check is case-sensitive.

Setting the key

Set this configuration key in the search package or data source configuration.

Use the configuration key editor to add or edit the crawler.allowed_redirect_pattern key, and set the value. This can be set to any valid String value.

Default value

crawler.allowed_redirect_pattern=

Examples

The following will allow the crawler to be redirected to any URL containing gatekeeper.com (without scraping additional links from the redirected site).

crawler.allowed_redirect_pattern=gatekeeper.com