Which non-html file formats to crawl (e.g. pdf, doc, xls etc.).
Can be set in: collection.cfg
This option is a comma-separated list of file extensions to download. The file types are for non-html files i.e. binary file types like .pdf, .doc etc. These files will not be parsed i.e. the crawler will not attempt to extract hyperlinks from them.