crawler.allowed_redirect_pattern
Background
When the crawler is redirected to a URL, it will check it against the include/exclude patterns, to determine whether it should continue processing that URL. Usually if the URL doesn’t match the include/exclude rules, it means the crawler has wandered offsite and shouldn’t proceed any further.
However, some websites use external authentication portals. The purpose of this variable is to allow the crawler to continue processing a URL even though it has been redirected offsite. The contents of the offsite pages won’t be stored, but the crawler will still be allowed to proceed, e.g. for the purposes of authentication / form interaction.
This check is case-sensitive. |