Built-in filters - Filter all documents as HTML (ForceHTMLMime)

The ForceHTMLMime filter forces all documents that are processed by the filter framework to present a text/html MIME type. This is useful where the data source only includes HTML content or the web server is not returning the correct MIME type.


To enable the filter add ForceHTMLMime to the filter chain.

The ForceHTMLMime filter must appear in the filter chain before other filters that rely on the text/html MIME type.


To force the gatherer to treat all documents as HTML then process the document using a filter: