crawler.non_html
Background
This option is a comma-separated list of file extensions to download. The file types are for non-html files i.e. binary file types like .pdf, .doc etc. These files will not be parsed i.e. the crawler will not attempt to extract hyperlinks from them.
If crawler.inline_filtering_enabled
is set to "true" then these
files will be filtered. If you don’t want this to happen for a specific type of file you can add its
MIME type to the filter.ignore.mimeTypes
setting.