filter.jsoup.classes
Specify which java/groovy classes will be used for filtering, and operate on JSoup objects rather than byte streams.
Key: filter.jsoup.classes
Type: List<String>
Can be set in: collection.cfg
Description
This setting specifies a list of Java/Groovy classes that are run by the Jsoup filter.
The value of this setting is a comma separated list of Jsoup filter class names to be run in the order specified (left to right).
The names given in this configuration option should be fully qualified Java/Groovy class names, or simple
class names which are then assumed to exist within the com.funnelback.common.filter.jsoup
package.
Groovy classes will be loaded from $SEARCH_HOME/lib/java/groovy
or the data source’s @groovy
directory,
and where they are declared within a package, the directory structure within the @groovy
folder below must reflect
the package name.
Default Value
filter.jsoup.classes=ContentGeneratorUrlDetection,FleschKincaidGradeLevel,UndesirableText,TitleDuplicates