Built-in filters - Metadata normaliser filter (MetadataNormaliser)
The metadata normaliser filter can be used to clean and normalise metadata values. Normalisation is particularly useful for faceted navigation, allowing similar categories to be merged into a single category.
The filter processed HTML meta tags (
<meta name="key" content="value">) and tests the value against regular expressions. The value is replaced when the value matches a regular expression.
To enable the filter add
MetadataNormaliser to the filter chain where
<default_filter_chain> is the default value.
Mapping must be defined in
collection.cfg, using the following key:
For example, to perform metadata normalisation on
<meta name="Author" ... > and
<meta name="Publisher" ... >, this value would be set to:
Keys are case insensitive. Any key name can be used - recommended practice is to use the same meta "name" attribute.
A corresponding mapping file must be defined for each key in
The first line in the mapping file is the
<key> expression, i.e. author. The key is case-insensitive and is treated as a regular expression (so expressions like
DC.Creator|Author are valid).
Each following line must be
Capture groups can be used (e.g.
Lines starting with
#are considered comments
Regular expressions are executed in order. The filter terminates on the first matching regular expression.
To normalise non-preferred values of
John Smith that may exist in
Creator metadata fields:
Set the following in the collection configuration:
Define the metadata normaliser mappings in
Author|Creator .*shakespeare.*=Shakespeare [wW]\.?[sS]\.?=Shakespeare [Ss]\.?[wW]\.?=Shakespeare jsmith=John Smith jack smith=John Smith j\. smith=John Smith johnny smith=John Smith