Plugin: Fetch external metadata from URL

Purpose

Use this plugin to download external metadata configuration from one or more publicly accessible URLs during an update.

When to use this plugin

  • To get title, description and other metadata for binary documents that are stored within your DXP Content Management service, or within Squiz Matrix.

  • If you are indexing content from an external system, and you need to download additional metadata to attach to URLs.

The downloaded external metadata must be a valid external-metadata.cfg file.

One URL of external metadata vs multiple URLs

Sometimes it is not practical to provide a single URL that contains all the external metadata as the content is too large, or takes too long to generate.

The External metadata file source URL type setting allows you to configure the external metadata fetcher to work with a single URL containing external metadata, or with a URL that contains a list of URLs that contain the external metadata.

If you are downloading a single URL, then the contents returned by the URL must be Funnelback external metadata configuration format.

If you are downloading multiple URLs, then the contents returned by the URL must be a text file containing the list of URLs, one URL per line. Each of the listed URLs is then fetched and must contain Funnelback external metadata configuration.

Use the multiple URL mode to fetch large amounts of metadata from the Squiz DXP CMS, or Squiz Matrix.

Usage

Enable the plugin

  1. Select Plugins from the side navigation pane and click on the Fetch external metadata from URL tile.

  2. From the Location section, select the data source to which you would like to enable this plugin from the Select a data source select list.

The plugin will take effect after setup steps and an advanced > full update of the data source has completed.

Configuration settings

The configuration settings section is where you do most of the configuration for your plugin. The settings enable you to control how the plugin behaves.

The configuration key names below are only used if you are configuring this plugin manually. The configuration keys are set in the data source configuration to configure the plugin. When setting the keys manually you need to type in (or copy and paste) the key name and value.

External metadata file source URL type

Configuration key

plugin.external-metadata-fetcher.config.url_source_type

Data type

string

Default value

External metadata file

Allowed values

External metadata file,List of external metadata file URLs

Required

This setting is required

Defines the source URL type for the external metadata configuration file.

Possible values:

  • External metadata file: a single external metadata file (default)

  • List of external metadata file URLs: a text file containing a list of URLs containing external metadata configuration to download, one URL per line.

External metadata file source URLs

Configuration key

plugin.external-metadata-fetcher.config.url_source

Data type

array

Required

This setting is required

Defines the source URLs for the external metadata configuration file, or file containing list of external metadata URLs (based on the external metadata file source URL type.

Fail on error

Configuration key

plugin.external-metadata-fetcher.config.fail-on-error

Data type

boolean

Default value

true

Required

This setting is optional

Defines if the update should fail with an error or just log a warning if external metadata file is not successfully downloaded.

Possible values:

  • true: The update will fail with an error. (default)

  • false: a warning will be logged, but the update will continue.

Examples

Fetch external metadata from a single URL

In this example https://example.com/extmet.txt contains external metadata definitions that you wish to include in your search.

To fetch the external metadata from https://example.com/extmet.txt, configure the plugin with:

Configuration key name Value

External metadata file source URL type

External metadata file

External metadata file source URL

https://example.com/extmet.txt

Fetch external metadata from multiple URLs

In this example you have external metadata that is contained within multiple URLs.

To fetch this external metadata you need to also have a URL that returns a list of URLs that contain the external metadata.

To fetch a list of external metadata URLs from https://example.com/list.txt, configure the plugin with:

Configuration key name Value

External metadata file source URL type

List of external metadata file URLs

External metadata file source URL

https://example.com/list.txt

The list of external metadata URLs file https://example.com/list.txt is a csv-like file. Each external metadata URL is on a separate line, delimited by a line break

For example:

https://example.com/extmet1.txt
https://example.com/extmet2.txt

Change log

[1.2.1]

Added

  • Extended External metadata file source URLs to support the multiple URLs input

[1.2.0]

Added

  • Extended the external metadata scraper to support multiple external metadata sources.

Changed

  • Refactored HTTP client from OKHttp to native HTTPClient

  • Refactored the unit tests from Spark to Wiremock

[1.1.0]

Changed

  • Updated to the latest version plugin framework (Funnelback shared v16.20) to enable integration with the new plugin management dashboard.