crawler.parser.mimeTypes

Background

This is a comma-separated list of MIME types. The web crawler will attempt to parse all downloaded documents that have this MIME type in order to extract URLs for further crawling.

You should not specify binary (application) MIME types in this parameter.

Setting the key

Set this configuration key in the search package or data source configuration.

Use the configuration key editor to add or edit the crawler.parser.mimeTypes key, and set the value. This can be set to any valid List<String> value.

Default value

crawler.parser.mimeTypes=text/html,text/plain,text/xml,application/xhtml+xml,application/rss+xml,application/atom+xml,application/json,application/rdf+xml,application/xml