Plugin: Combine / clone metadata

Purpose

This plugin can be used to create a new metadata field with a value that combines a set of metadata values from other metadata fields, or clones a metadata field.

Enabling the plugin

  1. Enable the combine-metadata plugin on your data source from the Extensions screen in the administration dashboard or add the following data source configuration to enable the plugin.

     plugin.combine-metadata.enabled=true
     plugin.combine-metadata.version=1.0.0
  2. Add CombineMetadataStringFilter to the filter chain:

    filter.classes=<OTHER-FILTERS>:com.funnelback.plugins.combinemetadata.CombineMetadataStringFilter:<OTHER-FILTERS>
    The plugin will take effect after a full update of the data source.
  3. The following options can be set in the data source configuration to configure the plugin:

    • plugin.combine-metadata.config.NEW_METADATA_FIELD.metadata.ORDER.field=SOURCE_METADATA_FIELD

    • plugin.combine-metadata.config.NEW_METADATA_FIELD.metadata.delimiter=DELIMITER

Where:

  • NEW_METADATA_FIELD : The name of the new metadata field.

  • ORDER : An identifier (usually a number) which specifies the order that metadata values should be combined. The order is processed using an alphabetic sort of the ORDER identifiers.

  • SOURCE_METADATA_FIELD : Name of the metadata field that will be used to source the value. This value supports a HTML metadata tag name (matching the name attribute of a HTML <META> tag), a filter generated metadata field name (a metadata field added in a previous part of the filter chain), or an XPath (matching an element or property within an XML document). The filter will skip any SOURCE_METADATA_FIELDS that do not exist.

  • DELIMITER : (optional) A string that will be used to join the SOURCE_METADATA_FIELD values when they are combined into the NEW_METADATA_FIELD.

Notes:

  • This plugin utilizes XSoup for processing of the XPaths. See: The XSoup readme for details on supported XPath syntax.

  • If there are multiple values that match a rule then all the values will be extracted and combined into a single field.

  • When working with XML if is advisable to split the XML document using a filter before combining any metadata.

  • Metadata added using this plugin is added to the list of sources that must be mapped to a metadata class (using the data source metadata mapping configuration) in the same manner as other metadata fields.

Examples

Create a name field from first name, middle name and last name metadata fields

To combine three different HTML metadata fields (firstname, middlename and lastname) into a fourth class called fullname set the following configs. The resultant metadata fullname will be made by combining firstname, middlename and lastname into a single value, with the values separated with a space.

Add the following plugin configuration:

plugin.combine-metadata.config.fullname.metadata.1.field=firstname
plugin.combine-metadata.config.fullname.metadata.2.field=middlename
plugin.combine-metadata.config.fullname.metadata.3.field=lastname
plugin.combine-metadata.config.fullname.metadata.delimiter=

Given a html document that contains the following metadata tags

<meta name="firstname" content="John"/>
<meta name="middlename" content="W."/>
<meta name="lastname" content="Smith"/>

This will produce a metadata field (fullname) that contains John W. Smith and add it to the filter metadata object. The generated field is equivalent to the HTML containing a meta tag of <meta name="fullname" content="John W. Smith"/>.

Missing fields

The filter will skip fields that are not present. If the html source contained only a first name and last name:

<meta name="firstname" content="Fred"/>
<meta name="lastname" content="Nerk"/>

This would produce a metadata field (fullname) that contains Fred Nerk.

Multiple matched fields

If the source HTML file includes multiple fields that match a rule then these are all combined into a single field.

<meta name="firstname" content="John"/>
<meta name="middlename" content="W."/>
<meta name="lastname" content="Smith"/>
<meta name="firstname" content="Fred"/>
<meta name="lastname" content="Nerk"/>

This will produce a metadata field (fullname) that contains John Fred W. Smith Nerk

Combine metadata from an XML document

This example creates an address field from five individual XML fields.

plugin.combine-metadata.config.address.metadata.1.field=//mail/address/number
plugin.combine-metadata.config.address.metadata.2.field=//mail/address/street
plugin.combine-metadata.config.address.metadata.3.field=//mail/address/state
plugin.combine-metadata.config.address.metadata.4.field=//mail/address/postcode
plugin.combine-metadata.config.address.metadata.5.field=//mail/address/country
plugin.combine-metadata.config.address.metadata.delimiter=,

Consider the following XML record.

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<organisations>
    <org>
        <orgname>Squiz Sydney</orgname>
        <mail>
            <address>
                <unit>Level 1</unit>
                <number>435a</number>
                <street>Kent St</street>
                <state>NSW</state>
                <country>Australia</country>
                <postcode>2000</postcode>
            </address>
        </mail>
    </org>
</org>

The combine metadata rule will create an additional metadata field (address) containing 435a, Kent St, Sydney, NSW, 2000, Australia.

Combine document metadata with filter generated metadata

This example shows how to combine metadata fields extracted from a HTML or XML document with metadata added in a previous filter.

This combines a course code (coursecode), generated in a previous filter with a course name that is specified in the html page metadata (<meta name="course.name">).

plugin.combine-metadata.config.course.metadata.1.field=coursecode
plugin.combine-metadata.config.course.metadata.2.field=course.name
plugin.combine-metadata.config.course.metadata.delimiter= -

If the source metadata includes the following:

<meta name="course.name" content="Bachelor of Economics">

and the following metadata field was created in a previous filter:

coursecode=EC100

The combine metadata rule will create an additional metadata field (course) with the value (EC100 - Bachelor of Economics)

Clone a metadata field

This example shows how to clone a metadata field.

In the following document we wish to clone the dc.title field and set this as a searchtitle so that we can choose which title to display in our search results, but still have all titles considered a title for ranking purposes.

plugin.combine-metadata.config.searchtitle.metadata.1.field=dc.title

If the source metadata includes the following:

<meta name="dc.title" content="This is a nice friendly title"/>

The combine metadata rule will create an additional metadata field (searchtitle) with the value This is a nice friendly title.

© 2015- Squiz Pty Ltd