Crawler Parser Mimetypes (collection.cfg)

Description

This is a comma-separated list of MIME types. The webcrawler will attempt to parse all downloaded documents that have this MIME type in order to extract URLs for further crawling.

NB: You should not specify binary (application) MIME types in this parameter.

Default value

crawler.parser.mimeTypes=text/html,text/plain,text/xml

See also

top