This setting controls where 'undesirable text' is listed for detection in content auditor.
The format allows for setting several sources to be defined, each with a key name (allowing collections to override the defaults).
The format of the file at the given path is expected to be a list of undesirable word sequences, with
newlines separating each sequence. Where multi-word sequences are used, each word should be separated
by a single space character. Text versions of HTML entities (e.g.
\u2014 instead of
be used where applicable.
Undesirable text files can be created from the administration interface file manager by selecting
from the create menu. To make use of this file, the
file_path must be set to
key_name can be any string as long as it is unique per collection.
This default setting provides a list of commonly misspelled words in English based on Wikipedia’s list of common misspellings for machines.
The following overrides the misspellings with a custom file, and also includes an additional set from 'undesirable-text.additional.cfg'.
\u2014 etc. e.g. aluminum purple monkey