Plugin: Combine or clone metadata fields
Purpose
Use this plugin if you need to combine several metadata fields into a new metadata field, or to clone an existing metadata field.
This plugin can be used to create a new metadata field with a value that combines a set of metadata values from other metadata fields, or clones a metadata field into a new field.
The metadata fields for cloning or combining can be sourced from HTML metadata fields, XML fields or metadata that was generated into the internal metadata object in a previous filter.
These new fields can be mapped along with the original fields when configuring your metadata mappings.
When to use this plugin?
-
To clone a metadata field if you need to map it to more than one metadata class. Funnelback only allows a metadata field to be mapped once so cloning the field allows you to map the original field data to more than one metadata class. For example, you might want to do this if you have a title contained in a metadata field that you wish to print out in your template, but also include in the title (t) metadata for ranking purposes.
-
To combine separate X and Y coordinate metadata into a latlong geospatial metadata field, suitable for use as geospatial metadata in Funnelback.
-
To enable you to sort by more than one metadata field, but creating a special sort field that combines the fields you wish to sub-sort by into a special sort field you can use for your sorting. e.g. you have a category field and a sub-category field and wish to provide an option where you sort by category then sub-category. You can achieve this by creating a field that combines category (first) with sub-category (second) and then provide an option to sort alphabetically by this new field.
-
To create a combined metadata field where are only the component values are available in separate fields (e.g. create a name field where you have firstname and lastname metadata fields).
About the plugin
XML metadata
This plugin also works with XML documents, accepting an XPath as the value when specifying your source metadata field.
When working with XML we recommend you split the XML file using the split HTML/XML plugin beforehand (if the XML file needs to be split for the search). |
Missing fields
The plugin will skip source fields that are not present producing a new field that combines from multiple source fields. e.g. If you are producing a name metadata field that combines firstname
, middlename
and lastname
metadata fields, and the html source contained only a first name and last name:
<meta name="firstname" content="Fred"/>
<meta name="lastname" content="Nerk"/>
then this would produce a metadata field (fullname
) that contains Fred Nerk
.
Multiple matched fields
If the source HTML file includes multiple fields that match a rule then these are all combined into a single field, with the order matching whatever is in the source document. e.g. If you are producing a name metadata field that combines firstname
, middlename
and lastname
metadata fields, and the html source contained:
<meta name="firstname" content="John"/>
<meta name="middlename" content="W."/>
<meta name="lastname" content="Smith"/>
<meta name="firstname" content="Fred"/>
<meta name="lastname" content="Nerk"/>
Then this will produce a metadata field (fullname
) that contains John Fred W. Smith Nerk
Modifying cloned values
The plugin supports a basic transformation that can be applied to fields that are cloned, prior to combining. The available options allow you to:
-
Convert the cloned value to lowercase or uppercase
-
Apply a regex search and replace over the copied value
How is the new field generated?
Metadata added using this plugin is added to an internal metadata object (allowing it to be modified in other chained filters) and will appear in the list of sources that must be mapped to a metadata class (using the data source metadata mapping configuration) in the same manner as other metadata fields.
Usage
Enable the plugin
-
Select Plugins from the side navigation pane and click on the Combine or clone metadata fields tile.
-
From the Location section, select the data source to which you would like to enable this plugin from the Select a data source select list.
The plugin will take effect after setup steps and an advanced > full update of the data source has completed. |
Configuration settings
The configuration settings section is where you do most of the configuration for your plugin. The settings enable you to control how the plugin behaves.
The configuration key names below are only used if you are configuring this plugin manually. The configuration keys are set in the data source configuration to configure the plugin. When setting the keys manually you need to type in (or copy and paste) the key name and value. |
Metadata source field
Configuration key |
|
Data type |
string |
Required |
This setting is optional |
The source of metadata field to be cloned or combined. Parameter 1 defines the new field name where this value should be appended. Parameter 2 is a unique identifier. The metadata is combined in the order that this unique identifier sorts so we recommend using a number. The value is the source html metadata field name (e.g. dc.title), XML field XPath (e.g. /item/title) or the name of a metadata field generated in a previous step.
String used to join metadata values
Configuration key |
|
Data type |
string |
Required |
This setting is optional |
This string will be used when joining the metadata fields, resulting in the following combined value: value 1 + join string + value 2. Parameter 1 indicates the field name that join string is applied to.
Metadata transform type
Configuration key |
|
Data type |
string |
Default value |
|
Allowed values |
NONE,UPPERCASE,LOWERCASE,REGEX |
Required |
This setting is optional |
The type of transformation to apply to the metadata field. Parameter 1 defines the new field name where this value should be appended. Parameter 2 is a unique identifier and must be matched with Parameter 2 of a metadata source field.
-
NONE: No transformation is applied.
-
UPPERCASE: The copied field is converted to uppercase.
-
LOWERCASE: The copied field is converted to lowercase.
-
REGEX: The copied field is transformed using a regular expression pattern.
Metadata regex matching pattern
Configuration key |
|
Data type |
string |
Required |
This setting is optional |
The regex pattern to match the metadata field. Parameter 1 defines the new field name where this value should be appended. Parameter 2 is a unique identifier and must be matched with Parameter 2 of a metadata source field. This only applies when the metadata transform type for the field is set to 'REGEX'.
This field is required if the Metadata transform type
is set to REGEX
.
Metadata regex transform replacement
Configuration key |
|
Data type |
string |
Required |
This setting is optional |
The replacement of the metadata field that matches the regex pattern. Parameter 1 defines the new field name where this value should be appended. Parameter 2 is a unique identifier and must be matched with Parameter 2 of a metadata source field. This only applies when the metadata transform type for the field is set to 'REGEX'
This field is required if the Metadata transform type
is set to REGEX
.
Filter chain configuration
This plugin uses filters which are used to apply transformations to the gathered content.
The filters run in sequence and need be set in an order that makes sense. The plugin supplied filter(s) (as indicated in the listing) should be re-ordered to an appropriate point in the sequence.
Changes to the filter order affects the way the data source processes gathered documents. See: document filters documentation. |
Filter classes
This plugin supplies a filter that runs in the main document filter chain: com.funnelback.plugins.combinemetadata.CombineMetadataStringFilter
Drag the com.funnelback.plugins.combinemetadata.CombineMetadataStringFilter plugin filter to where you wish it to run in the filter chain sequence.
Examples
Create a name field from first name, middle name and last name metadata fields
To combine three different HTML metadata fields (firstname
, middlename
and lastname
) into a fourth class called fullname
set the following configs.
Add the following plugin configuration:
Configuration key name | Parameter 1 | Parameter 2 | Value |
---|---|---|---|
Metadata source field |
|
|
|
Metadata source field |
|
|
|
Metadata source field |
|
|
|
String used to join metadata values |
|
(a space) |
|
Metadata transform type |
|
|
|
Metadata transform type |
|
|
|
Metadata transform type |
|
|
|
The resultant metadata fullname
will be made by combining firstname
, middlename
and lastname
into a single value, with the values separated with a space.
Given an HTML document that contains the following metadata tags:
<meta name="firstname" content="John"/>
<meta name="middlename" content="W."/>
<meta name="lastname" content="Smith"/>
This will produce a metadata field (fullname
) that contains John W. Smith
and add it to the filter metadata object. The generated field is equivalent to the HTML containing a meta tag of <meta name="fullname" content="John W. Smith"/>
.
Combine metadata from an XML document
This example creates an address
field from five individual XML fields.
Configuration key name | Parameter 1 | Parameter 2 | Value |
---|---|---|---|
Metadata source field |
|
|
|
Metadata source field |
|
|
|
Metadata source field |
|
|
|
Metadata source field |
|
|
|
Metadata source field |
|
|
|
String used to join metadata values |
|
|
|
Metadata transform type |
|
|
|
Metadata transform type |
|
|
|
Metadata transform type |
|
|
|
Metadata transform type |
|
|
|
Metadata transform type |
|
|
|
Consider the following XML record.
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<organizations>
<org>
<orgname>Squiz Sydney</orgname>
<mail>
<address>
<unit>Level 1</unit>
<number>435a</number>
<street>Kent St</street>
<state>NSW</state>
<country>Australia</country>
<postcode>2000</postcode>
</address>
</mail>
</org>
</organizations>
The combine metadata rule will create an additional metadata field (address
) containing 435A, Kent St, Sydney, NSW, 2000, Australia
.
Combine document metadata with filter generated metadata
This example shows how to combine metadata fields extracted from an HTML or XML document with metadata added in a previous filter.
This combines a course code (coursecode
), generated in a previous filter with a course name that is specified in the html page metadata (<meta name="course.name">
).
Configuration key name | Parameter 1 | Parameter 2 | Value |
---|---|---|---|
Metadata source field |
|
|
|
Metadata source field |
|
|
|
String used to join metadata values |
|
|
|
Metadata transform type |
|
|
|
Metadata transform type |
|
|
|
Metadata Regex matching patter |
|
|
|
Metadata Regex transform replacement |
|
|
|
If the source metadata includes the following:
<meta name="course.name" content="Bachelor of Economics">
and the following metadata field was created in a previous filter:
coursecode=EC100
The combine metadata rule will create an additional metadata field (course
) with the value (EC 100 - Bachelor of Economics
)
Clone a metadata field
This example shows how to clone a metadata field.
In the following document we wish to clone the dc.title
field and set this as a searchtitle
so that we can choose which title to display in our search results, but still have all titles considered a title for ranking purposes.
Configuration key name | Parameter 1 | Parameter 2 | Value |
---|---|---|---|
Metadata source field |
|
|
|
If the source metadata includes the following:
<meta name="dc.title" content="This is a nice friendly title"/>
The combine metadata rule will create an additional metadata field (searchtitle
) with the value This is a nice friendly title
.