crawler.store_empty_content_urls

Background

This parameter can be used to tell the webcrawler to store URLs even if, after they are filtered, they contain no content. Such URLs may be useful to store if, for example, they are PDF documents containing only images which can be returned on the basis of anchor text or words in the URL alone.

When enabled, any URLs which are stored despite having no content will be listed in the url_no_content.log file.

Setting the key

Set this configuration key in the search package or data source configuration.

Use the configuration key editor to add or edit the crawler.store_empty_content_urls key, and set the value. This can be set to any valid Boolean value.

Default value

crawler.store_empty_content_urls=false

See also