Plugin: Add metadata to URL
Purpose
Use this plugin to tag your content based on the URL, adding metadata to enhance your search results or faceted navigation.
This plugin extends the functionality provided by external metadata by providing flexible and complex matching against document URLs when associating your additional metadata.
When to use this plugin
Use this plugin:
-
when you need to tag your content based on the URL, for functionality such as faceted navigation.
-
to define some additional metadata that needs to be attached to URLs in the search index, when the built-in external metadata functionality doesn’t provide enough control over your URL matching.
Usage
Enable the plugin
-
Select Plugins from the side navigation pane and click on the Add metadata to URL tile.
-
From the Location section, select the data source to which you would like to enable this plugin from the Select a data source select list.
The plugin will take effect after setup steps and an advanced > full update of the data source has completed. |
Filter chain configuration
This plugin uses filters which are used to apply transformations to the gathered content.
The filters run in sequence and need be set in an order that makes sense. The plugin supplied filter(s) (as indicated in the listing) should be re-ordered to an appropriate point in the sequence.
Changes to the filter order affects the way the data source processes gathered documents. See: document filters documentation. |
Filter classes
This plugin supplies a filter that runs in the main document filter chain: com.funnelback.plugin.addmetadatatourl.AddMetadataToUrlStringFilter
Drag the com.funnelback.plugin.addmetadatatourl.AddMetadataToUrlStringFilter plugin filter to where you wish it to run in the filter chain sequence.
Configuration files
This plugin also uses the following configuration files to provide additional configuration.
external-metadata.json
Description |
Use this configuration file to define rules to add metadata to a url. |
Configuration file format |
json |
external-metadata.json
[
{
"name": "<rule name>",
"description": "<rule description>",
"patternType": "<rule pattern type>",
"pattern": "<URL pattern>",
"metadata": {
"<metadata name1>": "<metadata value1>",
"<metadata name2": "<metadata value2>"
}
}
]
The contents of this file must be valid JSON, starting with a top-level array. You should check your file with a JSON validator before uploading. If you upload a malformed JSON file your data source update will fail. |
The JSON fields are:
-
name
: Name to assign to the rule. -
description
: Description to assign to the rule (optional). -
patternType
: Defines the type of match that will be applied to the document’s URL. Acceptable values are:-
REGEX_PATTERN
: The URL must match (case-insensitively) thepattern
, expressed as a Java regular expression. -
LEFT_MATCH
: The URL must start with thepattern
. -
SUBSTRING
: The URL must include thepattern
somewhere in the URL.
-
-
pattern
: The URL is compared with this value using the method defined bypatternType
. -
metadata
: If the URL matches the pattern then the metadata listed as a set of key-value pairs is attached to the document.
If you upload this file via WebDAV then this file should be saved within the conf/plugin-configuration/add-metadata-to-url directory of the data source.
|
Define your metadata mappings
The configuration above results in additional metadata being attached to the gathered content, but this metadata is not automatically included in the search index.
You need to configure the metadata mappings for your data source so that the fields you have defined here are incorporated into the search index.
Examples
Consider the following plugin configuration external-metadata.json
file:
[{
"name":"All docs", (1)
"description":"Add this metadata to all documents",
"patternType":"REGEX_PATTERN",
"pattern":".*",
"metadata":{
"author":"John Smith",
"date":"2019"
}
}, {
"name":"Publications", (2)
"description":"Add this metadata to urls beginning with http://example.com/publications",
"patternType":"LEFT_MATCH",
"pattern":"http://example.com/publications",
"metadata":{
"type":"publications",
"department":"Example department"
}
}, {
"name":"Media pages", (3)
"description":"Add this metadata to urls containing /media/",
"patternType":"SUBSTRING",
"pattern":"/media/",
"metadata":{
"type":"media"
}
}]
1 | The all docs rule for each document adds two metadata values author: John Smith and date: 2019 . |
2 | The publications rule for URLs beginning with http://example.com/publications adds two metadata values type: publications and department: Example department . |
3 | The media pages rule for URLs containing /media/ in the URL path adds one metadata value type: media . |
After an update the additional metadata will appear as (HTML or HTTP header) sources that can be mapped to metadata classes (this includes adding as an additional source to an existing mapping).
You might define the following data source metadata mappings configuration to make this metadata available to your search:
Metadata class | Metadata source | Comment |
---|---|---|
author |
author |
Adds the new author metadata to the existing author metadata class. |
department |
department |
Add the new department metadata to a new department metadata class. |
contentType |
type |
Adds the new type metadata to a new contentType metadata class. |
To ensure this metadata is returned within the listMetadata element of the result data, you need to check that -SF=[author,department,contentType] is included in the query_processor_options on any results page that includes this data source.
|