Plugin: Metadata delimiters

Other versions of this plugin may exist. Please ensure you are viewing the documentation for the version that you are currently using. If you are not running the latest version of your plugin we recommend upgrading. See: list of all available versions of this plugin.

This filter can be used to replace the delimiters used in metadata fields on a per-field basis.

The delimiters are replaced with the Funnelback standard delimiter (a vertical bar character).

Usage

Enable the plugin

Enable the metadata-delimiters plugin on your data source from the Extensions screen in the administration dashboard or add the following data source configuration to enable the plugin.

plugin.metadata-delimiters.enabled=true
plugin.metadata-delimiters.version=1.0.0
The plugin will take effect after a full update of the data source.

Configuring the plugin

The MetadataDelimiters filter must be added to the jsoup filter chain in order for the plugin to work.

Add the filter to the jsoup.filter.classes in the data source configuration.

e.g.

filter.jsoup.classes=ContentGeneratorUrlDetection,FleschKincaidGradeLevel,UndesirableText,com.funnelback.plugin.metadatadelimiters.MetadataDelimiters

The following options must be set in the data source configuration to configure the plugin:

  • plugin.metadata-delimiters.config.metadata.<METADATA-FIELD-NAME>.delimiter=<CHARACTER-TO-REPLACE>: This defines the delimiter character <CHARACTER-TO-REPLACE> that applies to the specified metadata field. A key needs to be defined for each field where you want to set the field delimiter. Only a single field delimiter is supported. e.g. plugin.metadata-delimiters.config.metadata.keywords=, sets the separator for the <meta name="keywords"> field to a comma.

Additional configuration settings:

  • plugin.metadata-delimiters.config.metadata.<METADATA-FIELD-NAME>.attribute=<META-FIELD-ATTRIBUTE-CONTAINING-NAME>: This changes the <meta> tag property that holds the metadata field name. This is normally name but for some <meta> tags such as Open Graph meta tags this needs to be set to property. e.g. a standard metadata field looks like <meta name="dc.title" content="Example title">. An Open Graph metadata field looks like <meta property="og:title" content="Example title">. A key needs to be set for each metadata field that does not have the metadata field name defined within the name attribute of the <meta> tag.

  • plugin.metadata-delimiters.config.separator=<DELIMITER-TO-USE>: This defines the padre separator, which is what the delimiter is replaced with. Default is the vertical bar character |. This only needs to be changed if the facet_item_sepchars indexer option has been set and removes the vertical bar from the list of separators.

Example

Consider the following HTML file:

<html>
    <head>
        <title>Example document</title>
        <meta name="country" content="Australia, New Zealand">
        <meta name="fruit.type" content="apple; banana; pear">
        <meta name="colour" content="blue, green, orange, pink">
        <meta property="og:type" content="web page; article">
    </head>
    <content>
        ...
    </content>
</html>

To set the field delimiters for the fruit.type, colour and og:type fields:

  1. enable the metadata-delimiters plugin.

  2. add the metadata delimiters filter to the jsoup filter chain and ensure jsoup filtering is enabled in the main filter chain.

  3. Add the following data source configuration options to configure the plugin:

    plugin.metadata-delimiters.config.metadata.colour.delimiter=,
    plugin.metadata-delimiters.config.metadata.fruit.type.delimiter=;
    plugin.metadata-delimiters.config.metadata.og:type.delimiter=;
    plugin.metadata-delimiters.config.setting.og:type.attribute=property
  4. Run a full update of your data source.

The plugin will update the HTML that is stored on disk to:

<html>
    <head>
        <title>Example document</title>
        <meta name="country" content="Australia, New Zealand">
        <meta name="fruit.type" content="apple| banana| pear">
        <meta name="colour" content="blue| green| orange| pink">
        <meta property="og:type" content="web page| article">
    </head>
    <content>
        ...
    </content>
</html>

This will result in the Funnelback indexer splitting the colour, fruit.type and og:type fields when indexing. The country field will not get split.

All versions of metadata-delimiters