filter.jsoup.undesirable_text.[key_name]

Specify words expressions of undesirable text to detect and present within content auditor.

Key: filter.jsoup.undesirable_text.[key_name]
Type: String
Can be set in: collection.cfg

Description

This setting allows defining words expressions as 'undesirable text' for detection in content auditor.

The format allows for setting individual words directly in data source configuration:

filter.jsoup.undesirable_text.ID=WORD

where:

  • ID: A unique identifier for the word.

  • WORD: Defines the word (or phrase) to detect.

This option is only useful if you have a small number of words to add. Larger word lists should be provided as a configuration file. See filter.jsoup.undesirable_text-source.[key_name].

If any of these words are detected when the HTML document is analyzed the WORD will be added as a value to the following metadata fields:

A count of the occurrences of all undesirable words found in the page will also be recorded. The count is a total of all the detected words, including duplicates. The count will be recorded in the following metadata field:

Default Value

None

Examples

Adds two offensive words, nigger and coon to detect in HTML documents:

filter.jsoup.undesirable_text.1=nigger
filter.jsoup.undesirable_text.2=coon