Built-in HTML document filters - Detect undesirable text
The undesirable text filter analyzes HTML documents for the occurrence of words from various word lists.
This filter is required by content auditor to produce the undesirable text report.
Enabling
The undesirable text filter is enabled by default.
filter.jsoup.classes=<jsoup_filters>,UndesirableText,<jsoup_filters>
Configuration
filter.jsoup.undesirable_text.[key_name]=[word] filter.jsoup.undesirable_text-source.default-misspellings=$SEARCH_HOME/conf/common-misspellings.txt.default filter.jsoup.undesirable_text-source.weasel-words=undesirable-text.weasel-words.cfg