Plugin: Metadata delimiters

Purpose

Use this plugin when you need to define metadata field specific delimiters for the splitting of the metadata fields values.

When to use this plugin

Use this plugin if you have metadata fields that contain multiple values:

  • that you wish to use as categories in faceted navigation.

  • that you wish to be listed separately in your listMetadata data model element.

  • that you wish to map along with other fields into a single metadata class, and the fields use different delimiters (e.g. combining a comma-delimited keywords field and a semicolon delimited dc.subject field into your keywords metadata class)

Using this plugin with facet_item_sepchars

There is an indexer setting, facet_item_sepchars which defines a set of characters that will split the value of a metadata field. This is applied to every single metadata field and should be removed when using this plugin in most cases.

The facet_item_sepchars setting is handy if you want to split every metadata field using the same characters, but be aware this can have unwanted consequences. e.g. if you include a comma in the list of characters you might end up incorrectly splitting a description field. The plugin solves this problem by allowing you to define the split character for each HTML metadata field.

Usage

Enable the plugin

  1. Select Plugins from the side navigation pane and click on the Metadata delimiters tile.

  2. From the Location section, select the data source to which you would like to enable this plugin from the Select a data source select list.

The plugin will take effect after setup steps and an advanced > full update of the data source has completed.

Configuration settings

The configuration settings section is where you do most of the configuration for your plugin. The settings enable you to control how the plugin behaves.

The configuration key names below are only used if you are configuring this plugin manually. The configuration keys are set in the data source configuration to configure the plugin. When setting the keys manually you need to type in (or copy and paste) the key name and value.

Metadata delimiter

Configuration key

plugin.metadata-delimiters.config.metadata.*.delimiter

Data type

string

Required

This setting is required

This specifies the separator character to use when splitting the field defined in the corresponding metadata attribute rule. The metadata field name should be set as the parameter 1 value (e.g. 'dc.subject').

A key needs to be defined for each field where you want to set the field delimiter.

For example, you might wish to split your keywords metadata field using a comma, but split your dc.subject field using a semicolon.

Metadata attribute

Configuration key

plugin.metadata-delimiters.config.metadata.*.attribute

Data type

string

Required

This setting is optional

This specifies the HTML attribute within the <meta> tag that holds the field name (default is to use the name attribute). The metadata field name should be set as the parameter 1 value (e.g. 'dc.subject').

Standard HTML metadata fields specify the field name using the name attribute. However, some schemes use other properties within the <meta> tag. For example, Open Graph meta tags define the field name in the property attribute.

For example, a standard metadata field looks like <meta name="dc.title" content="Example title">. An Open Graph metadata field looks like <meta property="og:title" content="Example title">.

You will need to set this key for each metadata field that does not have the metadata field name defined within the name attribute of the <meta> tag.

Indexer internal field separator

Configuration key

plugin.metadata-delimiters.config.separator

Data type

string

Default value

`+

+`

Required

This defines the internal separator used by the indexer. Only change this if you have modified the facet_item_sepchars indexer option to remove the vertical bar.

If facet_item_sepchars is set, and you have removed the vertical bar from the list of separators then you need to ensure that this is set to one of the characters listed in the facet_item_sepchars value.

If you set facet_item_sepchars you might get unexpected behavior because this option defines global field separators that will be applied to all metadata fields. It is recommended that you remove the facet_item_sepchars indexer option if using this plugin, unless you really know what you are doing.

Filter chain configuration

This plugin uses filters which are used to apply transformations to the gathered content.

The filters run in sequence and need be set in an order that makes sense. The plugin supplied filter(s) (as indicated in the listing) should be re-ordered to an appropriate point in the sequence.

Changes to the filter order affects the way the data source processes gathered documents. See: document filters documentation.

Jsoup filter classes

This plugin supplies a filter that needs to run in the HTML document (Jsoup) filter chain:`com.funnelback.plugin.metadatadelimiters.MetadataDelimiters`

Drag the com.funnelback.plugin.metadatadelimiters.MetadataDelimiters plugin filter to where you wish it to run in the filter chain sequence.

Examples

Consider the following HTML file:

<html>
    <head>
        <title>Example document</title>
        <meta name="country" content="Australia, New Zealand">
        <meta name="fruit.type" content="apple; banana; pear">
        <meta name="colour" content="blue, green, orange, pink">
        <meta property="og:type" content="web page; article">
    </head>
    <body>
        ...
    </body>
</html>

After enabling the plugin on your data source, set the field delimiters for the fruit.type, colour and og:type fields:

Configure the plugin with the following configuration settings:

Configuration key name Parameter 1 Value

Metadata delimiter

color

,

Metadata delimiter

fruit.type

;

Metadata delimiter

og:type

;

Metadata attribute

og:type

property

  1. Ensure that the JsoupProcessingFilterProvider filter is in an appropriate position in the filter chain.

  2. Ensure that the com.funnelback.plugin.metadatadelimiters.MetadataDelimiters filter is in an appropriate position in the Jsoup filter chain.

After saving your configuration, run a full update of your data source.

The plugin will update the HTML that is stored on disk to:

<html>
    <head>
        <title>Example document</title>
        <meta name="country" content="Australia, New Zealand">
        <meta name="fruit.type" content="apple| banana| pear">
        <meta name="colour" content="blue| green| orange| pink">
        <meta property="og:type" content="web page| article">
    </head>
    <body>
        ...
    </body>
</html>

This will result in the indexer splitting the colour, fruit.type and og:type fields when indexing. The country field will not get split.

Change log

[1.1.0]

Changed

  • Updated to the latest version plugin framework (Funnelback shared v16.20) to enable integration with the new plugin management dashboard.