Plugin: Add metadata to URL

Purpose

Use this plugin to tag your content based on the URL, adding metadata to enhance your search results or faceted navigation.

This plugin extends the functionality provided by external metadata by providing flexible and complex matching against document URLs when associating your additional metadata.

When to use this plugin

Use this plugin:

  • when you need to tag your content based on the URL, for functionality such as faceted navigation.

  • to define some additional metadata that needs to be attached to URLs in the search index, when the built-in external metadata functionality doesn’t provide enough control over your URL matching.

Usage

Enable the plugin

  1. Select Plugins from the side navigation pane and click on the Add metadata to URL tile.

  2. From the Location section, select the data source to which you would like to enable this plugin from the Select a data source select list.

The plugin will take effect after setup steps and an advanced > full update of the data source has completed.

Filter chain configuration

This plugin uses filters which are used to apply transformations to the gathered content.

The filters run in sequence and need be set in an order that makes sense. The plugin supplied filter(s) (as indicated in the listing) should be re-ordered to an appropriate point in the sequence.

Changes to the filter order affects the way the data source processes gathered documents. See: document filters documentation.

Filter classes

This plugin supplies a filter that runs in the main document filter chain: com.funnelback.plugin.addmetadatatourl.AddMetadataToUrlStringFilter

Drag the com.funnelback.plugin.addmetadatatourl.AddMetadataToUrlStringFilter plugin filter to where you wish it to run in the filter chain sequence.

Configuration files

This plugin also uses the following configuration files to provide additional configuration.

external-metadata.json

Description

Use this configuration file to define rules to add metadata to a url.

Configuration file format

json

Configuration file example: external-metadata.json
[
  {
    "name": "<rule name>",
    "description": "<rule description>",
    "patternType": "<rule pattern type>",
    "pattern": "<URL pattern>",
    "metadata": {
      "<metadata name1>": "<metadata value1>",
      "<metadata name2": "<metadata value2>"
    }
  }
]
The contents of this file must be valid JSON, starting with a top-level array. You should check your file with a JSON validator before uploading. If you upload a malformed JSON file your data source update will fail.

The JSON fields are:

  • name: Name to assign to the rule.

  • description: Description to assign to the rule (optional).

  • patternType: Defines the type of match that will be applied to the document’s URL. Acceptable values are:

    • REGEX_PATTERN: The URL must match (case-insensitively) the pattern, expressed as a Java regular expression.

    • LEFT_MATCH: The URL must start with the pattern.

    • SUBSTRING: The URL must include the pattern somewhere in the URL.

  • pattern: The URL is compared with this value using the method defined by patternType.

  • metadata: If the URL matches the pattern then the metadata listed as a set of key-value pairs is attached to the document.

If you upload this file via WebDAV then this file should be saved within the conf/plugin-configuration/add-metadata-to-url directory of the data source.

Define your metadata mappings

The configuration above results in additional metadata being attached to the gathered content, but this metadata is not automatically included in the search index.

You need to configure the metadata mappings for your data source so that the fields you have defined here are incorporated into the search index.

Examples

Consider the following plugin configuration external-metadata.json file:

[{
  "name":"All docs", (1)
  "description":"Add this metadata to all documents",
  "patternType":"REGEX_PATTERN",
  "pattern":".*",
  "metadata":{
    "author":"John Smith",
    "date":"2019"
  }
}, {
  "name":"Publications", (2)
  "description":"Add this metadata to urls beginning with http://example.com/publications",
  "patternType":"LEFT_MATCH",
  "pattern":"http://example.com/publications",
  "metadata":{
    "type":"publications",
    "department":"Example department"
  }
}, {
  "name":"Media pages", (3)
  "description":"Add this metadata to urls containing /media/",
  "patternType":"SUBSTRING",
  "pattern":"/media/",
  "metadata":{
    "type":"media"
  }
}]
1 The all docs rule for each document adds two metadata values author: John Smith and date: 2019.
2 The publications rule for URLs beginning with http://example.com/publications adds two metadata values type: publications and department: Example department.
3 The media pages rule for URLs containing /media/ in the URL path adds one metadata value type: media.

After an update the additional metadata will appear as (HTML or HTTP header) sources that can be mapped to metadata classes (this includes adding as an additional source to an existing mapping).

You might define the following data source metadata mappings configuration to make this metadata available to your search:

Metadata class Metadata source Comment

author

author

Adds the new author metadata to the existing author metadata class.

department

department

Add the new department metadata to a new department metadata class.

contentType

type

Adds the new type metadata to a new contentType metadata class.

To ensure this metadata is returned within the listMetadata element of the result data, you need to check that -SF=[author,department,contentType] is included in the query_processor_options on any results page that includes this data source.

Change log

[1.1.0]

Changed

  • Updated to the latest version plugin framework (Funnelback shared v16.20) to enable integration with the new plugin management dashboard.