Plugin: Clone documents
Purpose
Use this plugin when you need to return a HTML document multiple times in the search results.
The main use case for this plugin is to facilitate an events search where you have events that may span multiple dates. Cloning the item for each applicable date allows for an events search to be built with the event being returned for each matching date. |
Usage
Enable the plugin
-
Select Plugins from the side navigation pane and click on the Clone documents tile.
-
From the Location section, decide if you wish to enable this plugin on a data source or a results page and select the corresponding radio button.
-
Select the data source or results page to which you would like to enable this plugin from the drop-down menu.
If enabled on a data source, the plugin will take effect as soon as the setup steps are completed, and an advanced > full update of the data source has completed. If enabled on a results page the plugin will take effect as soon as the setup steps are completed. |
Configuration settings
The configuration settings section is where you do most of the configuration for your plugin. The settings enable you to control how the plugin behaves.
The configuration key names below are only used if you are configuring this plugin manually. The configuration keys are set in the data source or results page configuration to configure the plugin. When setting the keys manually you need to type in (or copy and paste) the key name and value. |
Include selector
Configuration key |
|
Data type |
string |
Required |
This setting is required |
Any document that contains elements that match this Jsoup selector will be cloned using the clone documents plugin, e.g. meta[content=events] selects meta tags with a content property of events
Clone selector
Configuration key |
|
Data type |
string |
Required |
This setting is required |
The number of elements matching this Jsoup selector in the document determine the number of times that the document is cloned.
e.g. if you have a date metadata field repeated with 5 different date values, the document will be cloned 5 times.
Cloned URL suffix
Configuration key |
|
Data type |
string |
Required |
This setting is optional |
Specifies a suffix that will be attached to the end of the URL for the cloned records.
Add metadata field name
Configuration key |
|
Data type |
string |
Required |
This setting is optional |
Specifies a metadata field to add to the cloned records. 'Parameter 1' specifies an ID that must match a corresponding 'Add metadata content' field.
Add metadata field content
Configuration key |
|
Data type |
string |
Required |
This setting is optional |
Specifies a metadata value to insert into the cloned records. Parameter 1 must be a unique value within your data source configuration.
Remove selector
Configuration key |
|
Data type |
string |
Required |
This setting is optional |
Elements that match this Jsoup selector will be removed from the cloned pages. 'Parameter 1' is a unique identifier used to enable multipl remove selector fields to be defined.
Additional configuration settings
The originalUrl
metadata class must be added to the summary fields (-SF
) option of the query processor options for the plugin to function correctly.
THis is done by editing your results page configuration, and editing (or adding) the *query_processor_options` key. Edit (or add) the -SF
value to include the originalUrl
field.
Configuration key name | Value |
---|---|
query_processor_options |
|
Tracking the original URL
The original URL of the page will be added as document metadata - adding two additional metadata fields: original-url and fb-original-url.
e.g.
<meta name="original-url" content="ORIGINAL-URL">
<meta name="fb-original-url" content="ORIGINAL-URL">
The meta tag fb-original-url
is added to the metadata class originalUrl
for use within results pages. If you need the original URL for any additional filters, the original-url
metadata field should be used.
Canonical URLs
To prevent the default behaviour of handling the canonical link in Funnelback, all the canonical links in the cloned document will be removed during the index phase.
Filter chain configuration
This plugin uses filters which are used to apply transformations to the gathered content.
The filters run in sequence and need be set in an order that makes sense. The plugin supplied filter(s) (as indicated in the listing) should be re-ordered to an appropriate point in the sequence.
Changes to the filter order affects the way the data source processes gathered documents. See: document filters documentation. |
Examples
Clone event pages
In this example we have HTML event pages that contain metadata outlining the days that the event is running, with some events spanning multiple dates. We wish to create an events search, that shows events that run on specific dates.
Consider the following HTML page with the URL:
http://www.example.com/events/new-event
<html>
<head>
<title>Example Event Page</title>
<meta name="page-type" content="events" >
<meta name="event-date" content="2023-01-01">
<meta name="event-date" content="2023-01-21">
<meta name="internal-use" content="true">
...
</head>
...
</html>
We wish to clone this event page for each occurrence of the event-date
metadata field. This can be achieved with the following configuration:
Configuration key name | Parameter 1 | Value |
---|---|---|
Include selector |
|
|
Clone selector |
|
|
Add metadata field name |
|
|
Add metadata field content |
|
|
Remove selector |
|
|
Cloned URL suffix |
|
And originalUrl
added to the query processor options SF value in the results page configuration.
This results in the following two HTML documents being included in the index.
http://www.example.com/events/new-event/fb-recurring-event/1
<html>
<head>
<title>Example event page</title>
<meta name="page-type" content="events" >
<meta name="event-date" content="2023-01-01">
<meta name="collapsing" content="recurring-event">
<meta name="original-url" content="http://www.example.com/events/new-event">
<meta name="fb-original-url" content="http://www.example.com/events/new-event">
...
</head>
...
</html>
http://www.example.com/events/new-event/fb-recurring-event/2
<html>
<head>
<title>Example event page</title>
<meta name="page-type" content="events" >
<meta name="event-date" content="2023-01-21">
<meta name="collapsing" content="recurring-event">
<meta name="original-url" content="http://www.example.com/events/new-event">
<meta name="fb-original-url" content="http://www.example.com/events/new-event">
...
</head>
...
</html>
When running a search the results page will return two records with a title of Example event page and a liveUrl
and displayUrl
set to http://www.example.com/events/new-event
.