Built-in filters - Fix document titles (DocumentFixerFilterProvider)

This filter analyzes the document’s title and attempts to replace it if the title is not considered a good title.

The filter only processes titles of HTML documents.

Enabling and disabling the document title fixer

The document title fixer (DocumentFixerFilterProvider) is enabled by default and included in the default filter chain.

To enable the document title fixer on a custom filter chain add DocumentFixerFilterProvider to the collection’s filter.classes after the custom filter.

Example:

filter.classes=TikaFilterProvider:JSoupProcessingFilterProvider:myCustomFilter:DocumentFixerFilterProvider

To disable the document title fixer remove DocumentFixerFilterProvider from the filter chain.

Filter options

collection.cfg option Description

filter.document_fixer.timeout_ms

Configures the maximum amount of time to spend fixing a single title.

See also: