Built-in filters - Fix document titles (DocumentFixerFilterProvider)
This filter analyzes the document’s title and attempts to replace it if the title is not considered a good title.
The filter only processes titles of HTML documents.
Enabling and disabling the document title fixer
The document title fixer (DocumentFixerFilterProvider
) is enabled by default and included in the default filter chain.
To enable the document title fixer on a custom filter chain add DocumentFixerFilterProvider
to the collection’s filter.classes
after the custom filter.
Example:
filter.classes=TikaFilterProvider:JSoupProcessingFilterProvider:myCustomFilter:DocumentFixerFilterProvider
To disable the document title fixer remove DocumentFixerFilterProvider
from the filter chain.
Filter options
collection.cfg option |
Description |
---|---|
Configures the maximum amount of time to spend fixing a single title. |