filter.jsoup.undesirable_text.[key_name]
Background
This setting allows defining words expressions as 'undesirable text' for detection in content auditor.
The format allows for setting individual words directly in data source configuration:
filter.jsoup.undesirable_text.ID=WORD
where:
-
ID
: A unique identifier for the word. -
WORD
: Defines the word (or phrase) to detect.
This option is only useful if you have a small number of words to add. Larger word lists should be provided as
a configuration file. See filter.jsoup.undesirable_text-source.[key_name]
.
If any of these words are detected when the HTML document is analyzed the WORD
will be added as a value to the
following metadata fields:
-
X-Funnelback-Undesirable-Text
, iffilter.jsoup.undesirable_text-separate-lists
is disabled. -
X-Funnelback-Undesirable-Text-individual-words
, iffilter.jsoup.undesirable_text-separate-lists
is enabled.
A count of the occurrences of all undesirable words found in the page will also be recorded. The count is a total of all the detected words, including duplicates. The count will be recorded in the following metadata field:
-
X-Funnelback-Undesirable-Text-Count
, iffilter.jsoup.undesirable_text-separate-lists
is disabled. -
X-Funnelback-Undesirable-Text-individual-words-Count
, iffilter.jsoup.undesirable_text-separate-lists
is enabled.