crawler.parser.mimeTypes

Extract links from a list of content-types.

Key: crawler.parser.mimeTypes
Type: List<String>
Can be set in: collection.cfg

Description

This is a comma-separated list of MIME types. The web crawler will attempt to parse all downloaded documents that have this MIME type in order to extract URLs for further crawling.

You should not specify binary (application) MIME types in this parameter.

Default Value

crawler.parser.mimeTypes=text/html,text/plain,text/xml,application/xhtml+xml,application/rss+xml,application/atom+xml,application/json,application/rdf+xml,application/xml