crawler.parser.mimeTypes

Extract links from a list of content-types.

Key: crawler.parser.mimeTypes
Type: List<String>
Can be set in: collection.cfg

Description

This is a comma-separated list of MIME types. The webcrawler will attempt to parse all downloaded documents that have this MIME type in order to extract URLs for further crawling.

You should not specify binary (application) MIME types in this parameter.

Default Value

crawler.parser.mimeTypes=text/html,text/plain,text/xml,application/xhtml+xml,application/rss+xml,application/atom+xml,application/json,application/rdf+xml,application/xml

© 2015- Squiz Pty Ltd