Built-in filters - Metadata normaliser filter (MetadataNormaliser)


The metadata normaliser filter can be used to clean and normalise metadata values. Normalisation is particularly useful for faceted navigation, allowing similar categories to be merged into a single category.

The filter processed HTML meta tags (<meta name="key" content="value">) and tests the value against regular expressions. The value is replaced when the value matches a regular expression.


To enable the filter add MetadataNormaliser to the filter chain where <default_filter_chain> is the default value.


Configuring the metadata normaliser filter

Mapping must be defined in collection.cfg, using the following key:


For example, to perform metadata normalisation on <meta name="Author" ... > and <meta name="Publisher" ... >, this value would be set to:


Keys are case insensitive. Any key name can be used - recommended practice is to use the same meta "name" attribute.

A corresponding mapping file must be defined for each key in


Example filename:


The first line in the mapping file is the <key> expression, i.e. author. The key is case-insensitive and is treated as a regular expression (so expressions like DC.Creator|Author are valid).

  • Each following line must be <regex>=<replacement>

  • Capture groups can be used (e.g. (.*)@domain.com=$1)

  • Lines starting with # are considered comments

Regular expressions are executed in order. The filter terminates on the first matching regular expression.


To normalise non-preferred values of Shakespeare and John Smith that may exist in Author and Creator metadata fields:

Set the following in the collection configuration:


Define the metadata normaliser mappings in md_normaliser.author.mapping:

jsmith=John Smith
jack smith=John Smith
j\. smith=John Smith
johnny smith=John Smith

© 2015- Squiz Pty Ltd