crawler.ignore_nofollow
Background
This setting should only be used as a last resort as it disables the web crawler’s adherence to the robots
nofollow directive. It should only be used when it is not possible for the site owner to enable the crawler’s access
using other mechanisms (such as a sitemap.xml file linked from robots.txt ). Before you enable this setting you
must inform the site owner(s) and gain their permission to circumvent the nofollow directives.
|
This parameter enables or disables the crawler’s adherence to page-level robots nofollow
directives.
This setting is useful if you need to legitimately crawl parts of a website where the web crawler is blocked due to
robots nofollow directives. The default behaviour of the web crawler is to check for robots meta tags
<meta name="robots"content="nofollow"/>
or any link containing a rel="nofollow"
attribute. By default, the web
crawler will not follow those links.
Using this setting can also have unwanted side effects for the web crawler and sites that you are crawling. You should carefully check your web crawler logs to ensure you’re not storing or accessing content that you don’t wish to access and add appropriate exclude patterns. (For example you should ensure any search results pages and calendar feeds are explicitly added to your exclude patterns).
Ignoring robots nofollow
directives could also result in Funnelback being blacklisted from accessing your site by
the site owner.