Enables or disables the web crawler’s robots.txt support.

Key: crawler.ignore_robots_txt
Type: Boolean
Can be set in: collection.cfg


This setting should only be used as a last resort as it disables the web crawler’s adherence to the robots.txt standard. It should only be used when it is not possible for the site owner to update the robots.txt file. Before you enable this setting you must inform the site owner(s) and gain their permission to circumvent any robots.txt directives.

This parameter enables or disables the crawler’s support for robots.txt.

This setting is useful if you need to legitimately crawl a website where the web crawler is blocked due to robots.txt directives and the site owner is unable to update the robots.txt file to provide Funnelback with access. The default behaviour of the web crawler is to check for robots.txt and honor any directives.

Using this setting can also have unwanted side effects for the web crawler (such as disabling support for sitemap.xml files) and sites that you are crawling. You should carefully check your web crawler logs to ensure you’re not storing or accessing content that you don’t wish to access and add appropriate exclude patterns. (For example, you should ensure any search results pages and calendar feeds are explicitly added to your exclude patterns).

Ignoring robots.txt could also result in Funnelback being blacklisted from accessing your site by the site owner.

Disabling robots.txt support also disables the web crawlers support for reading sitemap.xml files as these are discovered by the presence of sitemap directives within a robots.txt file.

Default Value



To ignore robots.txt set