Plugin: Combine / clone metadata
Other versions of this plugin may exist. Please ensure you are viewing the documentation for the version that you are currently using. If you are not running the latest version of your plugin we recommend upgrading. See: list of all available versions of this plugin. |
Purpose
This plugin can be used to create a new metadata field with a value that combines a set of metadata values from other metadata fields, or clones a metadata field.
Enabling the plugin
-
Enable the combine-metadata plugin on your data source from the Extensions screen in the administration dashboard or add the following data source configuration to enable the plugin.
plugin.combine-metadata.enabled=true plugin.combine-metadata.version=1.0.1
-
Add
CombineMetadataStringFilter
to the filter chain:filter.classes=<OTHER-FILTERS>:com.funnelback.plugins.combinemetadata.CombineMetadataStringFilter:<OTHER-FILTERS>
The plugin will take effect after a full update of the data source. -
The following options can be set in the data source configuration to configure the plugin:
-
plugin.combine-metadata.config.NEW_METADATA_FIELD.metadata.ORDER.field=SOURCE_METADATA_FIELD
-
plugin.combine-metadata.config.NEW_METADATA_FIELD.metadata.delimiter=DELIMITER
-
Where:
-
NEW_METADATA_FIELD
: The name of the new metadata field. -
ORDER
: An identifier (usually a number) which specifies the order that metadata values should be combined. The order is processed using an alphabetic sort of theORDER
identifiers. -
SOURCE_METADATA_FIELD
: Name of the metadata field that will be used to source the value. This value supports a HTML metadata tag name (matching thename
attribute of a HTML<META>
tag), a filter generated metadata field name (a metadata field added in a previous part of the filter chain), or an XPath (matching an element or property within an XML document). The filter will skip anySOURCE_METADATA_FIELDS
that do not exist. -
DELIMITER
: (optional) A string that will be used to join theSOURCE_METADATA_FIELD
values when they are combined into theNEW_METADATA_FIELD
.
Notes:
-
For XML documents XPath syntax is supported to identify metadata fields.
-
If there are multiple values that match a rule then all the values will be extracted and combined into a single field.
-
When working with XML if is advisable to split the XML document using a filter before combining any metadata.
-
Metadata added using this plugin is added to the list of sources that must be mapped to a metadata class (using the data source metadata mapping configuration) in the same manner as other metadata fields.
Examples
Create a name field from first name, middle name and last name metadata fields
To combine three different HTML metadata fields (firstname
, middlename
and lastname
) into a fourth class called fullname
set the following configs.
The resultant metadata fullname
will be made by combining firstname
, middlename
and lastname
into a single value, with the values separated with a space.
Add the following plugin configuration:
plugin.combine-metadata.config.fullname.metadata.1.field=firstname plugin.combine-metadata.config.fullname.metadata.2.field=middlename plugin.combine-metadata.config.fullname.metadata.3.field=lastname plugin.combine-metadata.config.fullname.metadata.delimiter=
Given a html document that contains the following metadata tags
<meta name="firstname" content="John"/>
<meta name="middlename" content="W."/>
<meta name="lastname" content="Smith"/>
This will produce a metadata field (fullname
) that contains John W. Smith
and add it to the filter metadata object. The generated field is equivalent to the HTML containing a meta tag of <meta name="fullname" content="John W. Smith"/>
.
Missing fields
The filter will skip fields that are not present. If the html source contained only a first name and last name:
<meta name="firstname" content="Fred"/>
<meta name="lastname" content="Nerk"/>
This would produce a metadata field (fullname
) that contains Fred Nerk
.
Multiple matched fields
If the source HTML file includes multiple fields that match a rule then these are all combined into a single field.
<meta name="firstname" content="John"/>
<meta name="middlename" content="W."/>
<meta name="lastname" content="Smith"/>
<meta name="firstname" content="Fred"/>
<meta name="lastname" content="Nerk"/>
This will produce a metadata field (fullname
) that contains John Fred W. Smith Nerk
Combine metadata from an XML document
This example creates an address
field from five individual XML fields.
plugin.combine-metadata.config.address.metadata.1.field=//mail/address/number plugin.combine-metadata.config.address.metadata.2.field=//mail/address/street plugin.combine-metadata.config.address.metadata.3.field=//mail/address/state plugin.combine-metadata.config.address.metadata.4.field=//mail/address/postcode plugin.combine-metadata.config.address.metadata.5.field=//mail/address/country plugin.combine-metadata.config.address.metadata.delimiter=,
Consider the following XML record.
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<organisations>
<org>
<orgname>Squiz Sydney</orgname>
<mail>
<address>
<unit>Level 1</unit>
<number>435a</number>
<street>Kent St</street>
<state>NSW</state>
<country>Australia</country>
<postcode>2000</postcode>
</address>
</mail>
</org>
</org>
The combine metadata rule will create an additional metadata field (address
) containing 435a, Kent St, Sydney, NSW, 2000, Australia
.
Combine document metadata with filter generated metadata
This example shows how to combine metadata fields extracted from a HTML or XML document with metadata added in a previous filter.
This combines a course code (coursecode
), generated in a previous filter with a course name that is specified in the html page metadata (<meta name="course.name">
).
plugin.combine-metadata.config.course.metadata.1.field=coursecode plugin.combine-metadata.config.course.metadata.2.field=course.name plugin.combine-metadata.config.course.metadata.delimiter= -
If the source metadata includes the following:
<meta name="course.name" content="Bachelor of Economics">
and the following metadata field was created in a previous filter:
coursecode=EC100
The combine metadata rule will create an additional metadata field (course
) with the value (EC100 - Bachelor of Economics
)
Clone a metadata field
This example shows how to clone a metadata field.
In the following document we wish to clone the dc.title
field and set this as a searchtitle
so that we can choose which title to display in our search results, but still have all titles considered a title for ranking purposes.
plugin.combine-metadata.config.searchtitle.metadata.1.field=dc.title
If the source metadata includes the following:
<meta name="dc.title" content="This is a nice friendly title"/>
The combine metadata rule will create an additional metadata field (searchtitle
) with the value This is a nice friendly title
.