Plugin: Metadata delimiters
Purpose
Use this plugin when you need to define metadata field specific delimiters for the splitting of the metadata fields values.
When to use this plugin
Use this plugin if you have metadata fields that contain multiple values:
-
that you wish to use as categories in faceted navigation.
-
that you wish to be listed separately in your
listMetadata
data model element. -
that you wish to map along with other fields into a single metadata class, and the fields use different delimiters (e.g. combining a comma-delimited
keywords
field and a semicolon delimiteddc.subject
field into your keywords metadata class)
Using this plugin with facet_item_sepchars
There is an indexer setting, facet_item_sepchars
which defines a set of characters that will split the value of a metadata field. This is applied to every single metadata field and should be removed when using this plugin in most cases.
The facet_item_sepchars
setting is handy if you want to split every metadata field using the same characters, but be aware this can have unwanted consequences. e.g. if you include a comma in the list of characters you might end up incorrectly splitting a description field. The plugin solves this problem by allowing you to define the split character for each HTML metadata field.
Usage
Enable the plugin
-
Select Plugins from the side navigation pane and click on the Metadata delimiters tile.
-
From the Location section, select the data source to which you would like to enable this plugin from the Select a data source select list.
The plugin will take effect after setup steps and an advanced > full update of the data source has completed. |
Configuration settings
The configuration settings section is where you do most of the configuration for your plugin. The settings enable you to control how the plugin behaves.
The configuration key names below are only used if you are configuring this plugin manually. The configuration keys are set in the data source configuration to configure the plugin. When setting the keys manually you need to type in (or copy and paste) the key name and value. |
Metadata delimiter
Configuration key |
|
Data type |
string |
Required |
This setting is required |
This specifies the separator character to use when splitting the field defined in the corresponding metadata attribute rule. The metadata field name should be set as the parameter 1 value (e.g. 'dc.subject').
A key needs to be defined for each field where you want to set the field delimiter.
For example, you might wish to split your keywords
metadata field using a comma, but split your dc.subject
field using a semicolon.
Metadata attribute
Configuration key |
|
Data type |
string |
Required |
This setting is optional |
This specifies the HTML attribute within the <meta>
tag that holds the field name (default is to use the name attribute). The metadata field name should be set as the parameter 1 value (e.g. 'dc.subject').
Standard HTML metadata fields specify the field name using the name
attribute. However, some schemes use other properties within the <meta>
tag. For example, Open Graph meta tags define the field name in the property
attribute.
For example, a standard metadata field looks like <meta name="dc.title" content="Example title">
. An Open Graph metadata field looks like <meta property="og:title" content="Example title">
.
You will need to set this key for each metadata field that does not have the metadata field name defined within the name attribute of the <meta> tag.
|
Indexer internal field separator
Configuration key |
|
Data type |
string |
Default value |
`+ |
+` |
Required |
This defines the internal separator used by the indexer. Only change this if you have modified the facet_item_sepchars
indexer option to remove the vertical bar.
If facet_item_sepchars
is set, and you have removed the vertical bar from the list of separators then you need to ensure that this is set to one of the characters listed in the facet_item_sepchars
value.
If you set facet_item_sepchars you might get unexpected behavior because this option defines global field separators that will be applied to all metadata fields. It is recommended that you remove the facet_item_sepchars indexer option if using this plugin, unless you really know what you are doing.
|
Filter chain configuration
This plugin uses filters which are used to apply transformations to the gathered content.
The filters run in sequence and need be set in an order that makes sense. The plugin supplied filter(s) (as indicated in the listing) should be re-ordered to an appropriate point in the sequence.
Changes to the filter order affects the way the data source processes gathered documents. See: document filters documentation. |
Jsoup filter classes
This plugin supplies a filter that needs to run in the HTML document (Jsoup) filter chain:`com.funnelback.plugin.metadatadelimiters.MetadataDelimiters`
Drag the com.funnelback.plugin.metadatadelimiters.MetadataDelimiters plugin filter to where you wish it to run in the filter chain sequence.
Examples
Consider the following HTML file:
<html>
<head>
<title>Example document</title>
<meta name="country" content="Australia, New Zealand">
<meta name="fruit.type" content="apple; banana; pear">
<meta name="colour" content="blue, green, orange, pink">
<meta property="og:type" content="web page; article">
</head>
<body>
...
</body>
</html>
After enabling the plugin on your data source, set the field delimiters for the fruit.type
, colour
and og:type
fields:
Configure the plugin with the following configuration settings:
Configuration key name | Parameter 1 | Value |
---|---|---|
Metadata delimiter |
|
|
Metadata delimiter |
|
|
Metadata delimiter |
|
|
Metadata attribute |
|
|
-
Ensure that the
JsoupProcessingFilterProvider
filter is in an appropriate position in the filter chain. -
Ensure that the
com.funnelback.plugin.metadatadelimiters.MetadataDelimiters
filter is in an appropriate position in the Jsoup filter chain.
After saving your configuration, run a full update of your data source.
The plugin will update the HTML that is stored on disk to:
<html>
<head>
<title>Example document</title>
<meta name="country" content="Australia, New Zealand">
<meta name="fruit.type" content="apple| banana| pear">
<meta name="colour" content="blue| green| orange| pink">
<meta property="og:type" content="web page| article">
</head>
<body>
...
</body>
</html>
This will result in the indexer splitting the colour
, fruit.type
and og:type
fields when indexing. The country field will not get split.