Plugin: Combine or clone metadata fields

Purpose

Use this plugin if you need to combine several metadata fields into a new metadata field, or to clone an existing metadata field.

This plugin can be used to create a new metadata field with a value that combines a set of metadata values from other metadata fields, or clones a metadata field into a new field.

The metadata fields for cloning or combining can be sourced from HTML metadata fields, XML fields or metadata that was generated into the internal metadata object in a previous filter.

These new fields can be mapped along with the original fields when configuring your metadata mappings.

When to use this plugin?

  • To clone a metadata field if you need to map it to more than one metadata class. Funnelback only allows a metadata field to be mapped once so cloning the field allows you to map the original field data to more than one metadata class. For example, you might want to do this if you have a title contained in a metadata field that you wish to print out in your template, but also include in the title (t) metadata for ranking purposes.

  • To combine separate X and Y coordinate metadata into a latlong geospatial metadata field, suitable for use as geospatial metadata in Funnelback.

  • To enable you to sort by more than one metadata field, but creating a special sort field that combines the fields you wish to sub-sort by into a special sort field you can use for your sorting. e.g. you have a category field and a sub-category field and wish to provide an option where you sort by category then sub-category. You can achieve this by creating a field that combines category (first) with sub-category (second) and then provide an option to sort alphabetically by this new field.

  • To create a combined metadata field where are only the component values are available in separate fields (e.g. create a name field where you have firstname and lastname metadata fields).

About the plugin

XML metadata

This plugin also works with XML documents, accepting an XPath as the value when specifying your source metadata field.

When working with XML we recommend you split the XML file using the split HTML/XML plugin beforehand (if the XML file needs to be split for the search).

Missing fields

The plugin will skip source fields that are not present producing a new field that combines from multiple source fields. e.g. If you are producing a name metadata field that combines firstname, middlename and lastname metadata fields, and the html source contained only a first name and last name:

<meta name="firstname" content="Fred"/>
<meta name="lastname" content="Nerk"/>

then this would produce a metadata field (fullname) that contains Fred Nerk.

Multiple matched fields

If the source HTML file includes multiple fields that match a rule then these are all combined into a single field, with the order matching whatever is in the source document. e.g. If you are producing a name metadata field that combines firstname, middlename and lastname metadata fields, and the html source contained:

<meta name="firstname" content="John"/>
<meta name="middlename" content="W."/>
<meta name="lastname" content="Smith"/>
<meta name="firstname" content="Fred"/>
<meta name="lastname" content="Nerk"/>

Then this will produce a metadata field (fullname) that contains John Fred W. Smith Nerk

Modifying cloned values

The plugin supports a basic transformation that can be applied to fields that are cloned, prior to combining. The available options allow you to:

  • Convert the cloned value to lowercase or uppercase

  • Apply a regex search and replace over the copied value

How is the new field generated?

Metadata added using this plugin is added to an internal metadata object (allowing it to be modified in other chained filters) and will appear in the list of sources that must be mapped to a metadata class (using the data source metadata mapping configuration) in the same manner as other metadata fields.

Usage

Enable the plugin

  1. Select Plugins from the side navigation pane and click on the Combine or clone metadata fields tile.

  2. From the Location section, select the data source to which you would like to enable this plugin from the Select a data source select list.

The plugin will take effect after setup steps and an advanced > full update of the data source has completed.

Configuration settings

The configuration settings section is where you do most of the configuration for your plugin. The settings enable you to control how the plugin behaves.

The configuration key names below are only used if you are configuring this plugin manually. The configuration keys are set in the data source configuration to configure the plugin. When setting the keys manually you need to type in (or copy and paste) the key name and value.

Metadata source field

Configuration key

plugin.combine-metadata.config..metadata..field

Data type

string

Required

This setting is optional

The source of metadata field to be cloned or combined. Parameter 1 defines the new field name where this value should be appended. Parameter 2 is a unique identifier. The metadata is combined in the order that this unique identifier sorts so we recommend using a number. The value is the source html metadata field name (e.g. dc.title), XML field XPath (e.g. /item/title) or the name of a metadata field generated in a previous step.

String used to join metadata values

Configuration key

plugin.combine-metadata.config.*.metadata.delimiter

Data type

string

Required

This setting is optional

This string will be used when joining the metadata fields, resulting in the following combined value: value 1 + join string + value 2. Parameter 1 indicates the field name that join string is applied to.

Metadata transform type

Configuration key

plugin.combine-metadata.config..metadata.transform..type

Data type

string

Default value

NONE

Allowed values

NONE,UPPERCASE,LOWERCASE,REGEX

Required

This setting is optional

The type of transformation to apply to the metadata field. Parameter 1 defines the new field name where this value should be appended. Parameter 2 is a unique identifier and must be matched with Parameter 2 of a metadata source field.

  • NONE: No transformation is applied.

  • UPPERCASE: The copied field is converted to uppercase.

  • LOWERCASE: The copied field is converted to lowercase.

  • REGEX: The copied field is transformed using a regular expression pattern.

Metadata regex matching pattern

Configuration key

plugin.combine-metadata.config..metadata.transform..pattern

Data type

string

Required

This setting is optional

The regex pattern to match the metadata field. Parameter 1 defines the new field name where this value should be appended. Parameter 2 is a unique identifier and must be matched with Parameter 2 of a metadata source field. This only applies when the metadata transform type for the field is set to 'REGEX'.

This field is required if the Metadata transform type is set to REGEX.

Metadata regex transform replacement

Configuration key

plugin.combine-metadata.config..metadata.transform..replacement

Data type

string

Required

This setting is optional

The replacement of the metadata field that matches the regex pattern. Parameter 1 defines the new field name where this value should be appended. Parameter 2 is a unique identifier and must be matched with Parameter 2 of a metadata source field. This only applies when the metadata transform type for the field is set to 'REGEX'

This field is required if the Metadata transform type is set to REGEX.

Filter chain configuration

This plugin uses filters which are used to apply transformations to the gathered content.

The filters run in sequence and need be set in an order that makes sense. The plugin supplied filter(s) (as indicated in the listing) should be re-ordered to an appropriate point in the sequence.

Changes to the filter order affects the way the data source processes gathered documents. See: document filters documentation.

Filter classes

This plugin supplies a filter that runs in the main document filter chain: com.funnelback.plugins.combinemetadata.CombineMetadataStringFilter

Drag the com.funnelback.plugins.combinemetadata.CombineMetadataStringFilter plugin filter to where you wish it to run in the filter chain sequence.

Examples

Create a name field from first name, middle name and last name metadata fields

To combine three different HTML metadata fields (firstname, middlename and lastname) into a fourth class called fullname set the following configs.

Add the following plugin configuration:

Configuration key name Parameter 1 Parameter 2 Value

Metadata source field

fullname

1

firstname

Metadata source field

fullname

2

middlename

Metadata source field

fullname

3

lastname

String used to join metadata values

fullname

(a space)

Metadata transform type

fullname

1

NONE

Metadata transform type

fullname

2

NONE

Metadata transform type

fullname

3

NONE

The resultant metadata fullname will be made by combining firstname, middlename and lastname into a single value, with the values separated with a space.

Given an HTML document that contains the following metadata tags:

<meta name="firstname" content="John"/>
<meta name="middlename" content="W."/>
<meta name="lastname" content="Smith"/>

This will produce a metadata field (fullname) that contains John W. Smith and add it to the filter metadata object. The generated field is equivalent to the HTML containing a meta tag of <meta name="fullname" content="John W. Smith"/>.

Combine metadata from an XML document

This example creates an address field from five individual XML fields.

Configuration key name Parameter 1 Parameter 2 Value

Metadata source field

address

1

//mail/address/number

Metadata source field

address

2

//mail/address/street

Metadata source field

address

3

//mail/address/state

Metadata source field

address

4

//mail/address/postcode

Metadata source field

address

5

//mail/address/country

String used to join metadata values

address

, (comma followed by space)

Metadata transform type

address

1

UPPERCASE

Metadata transform type

address

2

NONE

Metadata transform type

address

3

NONE

Metadata transform type

address

4

NONE

Metadata transform type

address

5

NONE

Consider the following XML record.

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<organizations>
    <org>
        <orgname>Squiz Sydney</orgname>
        <mail>
            <address>
                <unit>Level 1</unit>
                <number>435a</number>
                <street>Kent St</street>
                <state>NSW</state>
                <country>Australia</country>
                <postcode>2000</postcode>
            </address>
        </mail>
    </org>
</organizations>

The combine metadata rule will create an additional metadata field (address) containing 435A, Kent St, Sydney, NSW, 2000, Australia.

Combine document metadata with filter generated metadata

This example shows how to combine metadata fields extracted from an HTML or XML document with metadata added in a previous filter.

This combines a course code (coursecode), generated in a previous filter with a course name that is specified in the html page metadata (<meta name="course.name">).

Configuration key name Parameter 1 Parameter 2 Value

Metadata source field

course

1

coursecode

Metadata source field

course

2

course.name

String used to join metadata values

course

- (space dash space)

Metadata transform type

course

1

REGEX

Metadata transform type

course

2

NONE

Metadata Regex matching patter

course

1

(\[A-Z]\+)([\d]+)

Metadata Regex transform replacement

course

1

$1 $2

If the source metadata includes the following:

<meta name="course.name" content="Bachelor of Economics">

and the following metadata field was created in a previous filter:

coursecode=EC100

The combine metadata rule will create an additional metadata field (course) with the value (EC 100 - Bachelor of Economics)

Clone a metadata field

This example shows how to clone a metadata field.

In the following document we wish to clone the dc.title field and set this as a searchtitle so that we can choose which title to display in our search results, but still have all titles considered a title for ranking purposes.

Configuration key name Parameter 1 Parameter 2 Value

Metadata source field

searchtitle

1

dc.title

If the source metadata includes the following:

<meta name="dc.title" content="This is a nice friendly title"/>

The combine metadata rule will create an additional metadata field (searchtitle) with the value This is a nice friendly title.

Change log

[1.2.0]

Added

  • Added basic transformation options that can be applied to individual metadata fields that are being combined or cloned.

[1.1.0]

Changed

  • Updated to the latest version plugin framework (Funnelback shared v16.20) to enable integration with the new plugin management dashboard.

[1.0.1]

Changed

  • Use Java native isBlank method as in Jsoup v1.12.1 class org.jsoup.helper.StringUtil was moved and marked for internal use only.

  • Removed dependency on xsoup as since Jsoup v1.14.3 XPath selector is supported.

See also