Specify words expressions of undesirable text to detect and present within content auditor.
Can be set in: collection.cfg
This setting allows defining words expressions as 'undesirable text' for detection in content auditor.
The format allows for setting individual words directly in data source configuration:
ID: A unique identifier for the word.
WORD: Defines the word (or phrase) to detect.
This option is only useful if you have a small number of words to add. Larger word lists should be provided as
a configuration file. See
If any of these words are detected when the HTML document is analyzed the
WORD will be added as a value to the
following metadata fields:
A count of the occurrences of all undesirable words found in the page will also be recorded. The count is a total of all the detected words, including duplicates. The count will be recorded in the following metadata field:
Adds two offensive words, nigger and coon to detect in HTML documents: