filter.jsoup.classes

Background

This setting specifies a list of Java/Groovy classes that are run by the Jsoup filter.

The value of this setting is a comma separated list of Jsoup filter class names to be run in the order specified (left to right).

The names given in this configuration option should be fully qualified Java/Groovy class names, or simple class names which are then assumed to exist within the com.funnelback.common.filter.jsoup package. Groovy classes will be loaded from $SEARCH_HOME/lib/java/groovy or the data source’s @groovy directory, and where they are declared within a package, the directory structure within the @groovy folder below must reflect the package name.

Setting the key

Set this configuration key in the search package or data source configuration.

Use the configuration key editor to add or edit the filter.jsoup.classes key, and set the value. This can be set to any valid List<String> value.

Default value

filter.jsoup.classes=ContentGeneratorUrlDetection,FleschKincaidGradeLevel,UndesirableText,TitleDuplicates

Examples

Add extra custom Jsoup filter (com.example.CustomFilter) that will process the HTML after all the default Jsoup filters have run:

filter.jsoup.classes=ContentGeneratorUrlDetection,FleschKincaidGradeLevel,UndesirableText,com.example.CustomFilter