filter.jsoup.undesirable_text.[key_name]

Background

This setting allows defining words expressions as 'undesirable text' for detection in content auditor.

The format allows for setting individual words directly in data source configuration:

filter.jsoup.undesirable_text.ID=WORD

where:

  • ID: A unique identifier for the word.

  • WORD: Defines the word (or phrase) to detect.

This option is only useful if you have a small number of words to add. Larger word lists should be provided as a configuration file. See filter.jsoup.undesirable_text-source.[key_name].

If any of these words are detected when the HTML document is analyzed the WORD will be added as a value to the following metadata fields:

A count of the occurrences of all undesirable words found in the page will also be recorded. The count is a total of all the detected words, including duplicates. The count will be recorded in the following metadata field:

Setting the key

Set this configuration key in the search package or data source configuration.

Use the configuration key editor to add or edit the filter.jsoup.undesirable_text.[key_name] key, and set the value. This can be set to any valid String value.

Default value

None

Examples

Adds two offensive words, nigger and coon to detect in HTML documents:

filter.jsoup.undesirable_text.1=nigger
filter.jsoup.undesirable_text.2=coon