Include binary documents obtained by the web crawler in the search index
Funnelback will remove binary documents that it is unable to filter from the index by default. This is sometimes undesirable as you may wish the document’s URL to be displayed regardless, even if no useful text can be extracted.
The instructions below show how to include file types in the index that could not be filtered (converted to text). If you want to add a non-default file type see: Configuring Funnelback to index additional file types and only attempt the steps below if this does not work.
Ensure that the extensions are listed in the
crawler.non_htmldata source configuration option:
remove the type from the
crawler.reject_filesdata source configuration option:
ensure that these documents are not filtered by adding the mime type to the
filter.ignore.mimeTypesdata source configuration option:
These instructions assume that the file type in question could not be filtered successfully.
-ibd indexer option - this tells the indexer to include binary documents in the index. However when the index is built the indexer sets a flag in the index for each of these documents that prevents them from displaying - this flag needs to be removed.