Specify sources of undesirable text strings to detect and present within content auditor.

Key: filter.jsoup.undesirable_text-source.[key_name]
Type: String
Can be set in: collection.cfg


This setting controls where 'undesirable text' is listed for detection in content auditor.

The format allows for setting several sources to be defined, each with a key name (allowing collections to override the defaults).


The format of the file at the given path is expected to be a list of undesirable word sequences, with newlines separating each sequence. Where multi-word sequences are used, each word should be separated by a single space character. Text versions of HTML entities (e.g. \u2014 instead of —) should be used where applicable.

Undesirable text files can be created from the administration interface file manager by selecting undesirable-text.*.cfg from the create menu. To make use of this file, the file_path must be set to $SEARCH_HOME/conf/$COLLECTION_NAME/undesirable-text.<name>.cfg.

The key_name can be any string as long as it is unique per collection.

Default values


This default setting provides a list of commonly misspelled words in English based on Wikipedia’s list of common misspellings for machines.


The following overrides the misspellings with a custom file, and also includes an additional set from 'undesirable-text.additional.cfg'.


more_undesirable_text.txt contains:

purple monkey

See Also

© 2015- Squiz Pty Ltd