Plugin: Add metadata to URL

Purpose

This plugin can be used to add extra metadata to documents during the filter step of a data source update.

The plugin extends the functionality provided by external metadata by allowing for more flexible and complex matching against document URLs.

Usage

Enabling the plugin

  1. Enable the add-metadata-to-url plugin on your data source from the Extensions screen in the administration dashboard or add the following data source configuration to enable the plugin.

plugin.add-metadata-to-url.enabled=true
plugin.add-metadata-to-url.version=1.0.0
  1. The AddMetadataToUrlStringFilter filter must be added to the filter chain for the plugin to work correctly. Add the filter to the filter.classes in the data source configuration.

filter.classes=<OTHER-FILTERS>:com.funnelback.plugin.addmetadatatourl.AddMetadataToUrlStringFilter:<OTHER-FILTERS>
The filter should be placed at an appropriate position in the filter chain. In most circumstances, this should be located towards the end of the filter chain.
  1. Configure the plugin (see the plugin configuration settings section below)

  2. Add metadata mappings for metadata added via the plugin using the configure metadata mappings screen in the data source configuration.

  3. Run a full update of the data source. Note: a full update is required as all of your documents must be re-gathered and filtered for any changes to take effect. If you are using this with a push data source then you will need to resubmit anything where you want the new filter to be applied.

Plugin configuration settings

The plugin configuration must be provided in a configuration file named external-metadata.json, saved within the conf/plugin-configuration/add-metadata-to-url directory using the data source configuration files editor, or via WebDAV.

The content of file must be valid JSON, starting with a top-level array. The JSON fields are:

  • name: Name to assign to the rule.

  • description: Description to assign to the rule (optional).

  • patternType: Defines the type of match that will be applied to the document’s URL. Acceptable values are:

    • REGEX_PATTERN: The URL must match (case-insensitively) the pattern, expressed as a Java regular expression.

    • LEFT_MATCH: The URL must start with the pattern.

    • SUBSTRING: The URL must include the pattern somewhere in the URL.

  • pattern: The URL is compared with this value using the method defined by patternType.

  • metadata: If the URL matches the pattern then the metadata listed as a set of key-value pairs is attached to the document.

[
  {
    "name": "<rule name>",
    "description": "<rule description>",
    "patternType": "<rule pattern type>",
    "pattern": "<URL pattern>",
    "metadata": {
      "<metadata name1>": "<metadata value1>",
      "<metadata name2": "<metadata value2>"
    }
  }
]

Example

Consider the following plugin configuration external-metadata.json file:

[{
  "name":"All docs",
  "description":"Add this metadata to all documents",
  "patternType":"REGEX_PATTERN",
  "pattern":".*",
  "metadata":{
    "author":"John Smith",
    "date":"2019"
  }
}, {
  "name":"Publications",
  "description":"Add this metadata to urls beginning with http://example.com/publications",
  "patternType":"LEFT_MATCH",
  "pattern":"http://example.com/publications",
  "metadata":{
    "type":"publications",
    "department":"Example department"
  }
}, {
  "name":"Media pages",
  "description":"Add this metadata to urls containing /media/",
  "patternType":"SUBSTRING",
  "pattern":"/media/",
  "metadata":{
    "type":"media"
  }
}]
  • The all docs rule for each document adds two metadata values author: John Smith and date: 2019.

  • The publications rule for URLs beginning with http://example.com/publications adds two metadata values type: publications and department: Example department.

  • The media pages rule for URLs containing /media/ in the URL path adds one metadata value type: media.

Metadata added using this plugin are considered metadata 'sources'. You will need to configure these sources to be added to an appropriate metadata class depending on your use-case for the metadata. In the example above, you may want to add the 'type' metadata source to the 'type' metadata class.

© 2015- Squiz Pty Ltd