Plugin: Add metadata to URL
Other versions of this plugin may exist. Please ensure you are viewing the documentation for the version that you are currently using. If you are not running the latest version of your plugin we recommend upgrading. See: list of all available versions of this plugin. |
Purpose
This plugin can be used to add extra metadata to documents during the filter step of a data source update.
The plugin extends the functionality provided by external metadata by allowing for more flexible and complex matching against document URLs.
Usage
Enabling the plugin
-
Enable the add-metadata-to-url plugin on your data source from the Extensions screen in the administration dashboard or add the following data source configuration to enable the plugin.
plugin.add-metadata-to-url.enabled=true
plugin.add-metadata-to-url.version=1.0.0
-
The
AddMetadataToUrlStringFilter
filter must be added to the filter chain for the plugin to work correctly. Add the filter to thefilter.classes
in the data source configuration.
filter.classes=<OTHER-FILTERS>:com.funnelback.plugin.addmetadatatourl.AddMetadataToUrlStringFilter:<OTHER-FILTERS>
The filter should be placed at an appropriate position in the filter chain. In most circumstances, this should be located towards the end of the filter chain. |
-
Configure the plugin (see the plugin configuration settings section below)
-
Add metadata mappings for metadata added via the plugin using the configure metadata mappings screen in the data source configuration.
-
Run a full update of the data source. Note: a full update is required as all of your documents must be re-gathered and filtered for any changes to take effect. If you are using this with a push data source then you will need to resubmit anything where you want the new filter to be applied.
Plugin configuration settings
The plugin configuration must be provided in a configuration file named external-metadata.json
, saved within the conf/plugin-configuration/add-metadata-to-url
directory of the data source. This file can currently only be edited using WebDAV.
The content of file must be valid JSON, starting with a top-level array. The JSON fields are:
-
name
: Name to assign to the rule. -
description
: Description to assign to the rule (optional). -
patternType
: Defines the type of match that will be applied to the document’s URL. Acceptable values are:-
REGEX_PATTERN
: The URL must match (case-insensitively) thepattern
, expressed as a Java regular expression. -
LEFT_MATCH
: The URL must start with thepattern
. -
SUBSTRING
: The URL must include thepattern
somewhere in the URL.
-
-
pattern
: The URL is compared with this value using the method defined bypatternType
. -
metadata
: If the URL matches the pattern then the metadata listed as a set of key-value pairs is attached to the document.
[
{
"name": "<rule name>",
"description": "<rule description>",
"patternType": "<rule pattern type>",
"pattern": "<URL pattern>",
"metadata": {
"<metadata name1>": "<metadata value1>",
"<metadata name2": "<metadata value2>"
}
}
]
Example
Consider the following plugin configuration external-metadata.json
file:
[{
"name":"All docs",
"description":"Add this metadata to all documents",
"patternType":"REGEX_PATTERN",
"pattern":".*",
"metadata":{
"author":"John Smith",
"date":"2019"
}
}, {
"name":"Publications",
"description":"Add this metadata to urls beginning with http://example.com/publications",
"patternType":"LEFT_MATCH",
"pattern":"http://example.com/publications",
"metadata":{
"type":"publications",
"department":"Example department"
}
}, {
"name":"Media pages",
"description":"Add this metadata to urls containing /media/",
"patternType":"SUBSTRING",
"pattern":"/media/",
"metadata":{
"type":"media"
}
}]
-
The all docs rule for each document adds two metadata values
author: John Smith
anddate: 2019
. -
The publications rule for URLs beginning with
http://example.com/publications
adds two metadata valuestype: publications
anddepartment: Example department
. -
The media pages rule for URLs containing
/media/
in the URL path adds one metadata valuetype: media
.
Metadata added using this plugin are considered metadata 'sources'. You will need to configure these sources to be added to an appropriate metadata class depending on your use-case for the metadata. In the example above, you may want to add the 'type' metadata source to the 'type' metadata class. |