Plugin: Clone documents
Purpose
Use this plugin when you need to return a HTML document multiple times in the search results.
The main use case for this plugin is to facilitate an events search where you have events that may span multiple dates. Cloning the item for each applicable date allows for an events search to be built with the event being returned for each matching date. |
Usage
Enable the plugin
-
Select Plugins from the side navigation pane and click on the Clone documents tile.
-
From the Location section, decide if you wish to enable this plugin on a data source or a results page and select the corresponding radio button.
-
Select the data source or results page to which you would like to enable this plugin from the drop-down menu.
If enabled on a data source, the plugin will take effect as soon as the setup steps are completed, and an advanced > full update of the data source has completed. If enabled on a results page the plugin will take effect as soon as the setup steps are completed. |
Configuration settings
The configuration settings section is where you do most of the configuration for your plugin. The settings enable you to control how the plugin behaves.
The configuration key names below are only used if you are configuring this plugin manually. The configuration keys are set in the data source or results page configuration to configure the plugin. When setting the keys manually you need to type in (or copy and paste) the key name and value. |
Choose how your document will be cloned
Configuration key |
|
Data type |
string |
Default value |
|
Allowed values |
Repeated fields,Repeated values in a field |
Required |
This setting is required |
This option controls whether the document is cloned based on a repeated field within your document, or repeated values within a specified field.
Include selector
Configuration key |
|
Data type |
string |
Required |
This setting is required |
Any document that contains elements that match this Jsoup selector will be cloned using the clone documents plugin, e.g. meta[content=events] selects meta tags with a content property of events
Multi-value element delimiter
Configuration key |
|
Data type |
string |
Default value |
`+ |
+` |
Required |
Specifies a delimiter to split the selected element content on if the element contains multiple values.
Clone selector
Configuration key |
|
Data type |
string |
Required |
This setting is required |
The number of elements matching this Jsoup selector in the document determine the number of times that the document is cloned.
e.g. if you have a date metadata field repeated with 5 different date values, the document will be cloned 5 times.
Cloned URL suffix
Configuration key |
|
Data type |
string |
Required |
This setting is optional |
Specifies a suffix that will be attached to the end of the URL for the cloned records.
Add metadata field name
Configuration key |
|
Data type |
string |
Required |
This setting is optional |
Specifies a metadata field to add to the cloned records. 'Parameter 1' specifies an ID that must match a corresponding 'Add metadata content' field.
Add metadata field content
Configuration key |
|
Data type |
string |
Required |
This setting is optional |
Specifies a metadata value to insert into the cloned records. Parameter 1 must be a unique value within your data source configuration.
Remove selector
Configuration key |
|
Data type |
string |
Required |
This setting is optional |
Elements that match this Jsoup selector will be removed from the cloned pages. 'Parameter 1' is a unique identifier used to enable multipl remove selector fields to be defined.
Additional configuration settings
The originalUrl
metadata class must be added to the summary fields (-SF
) option of the query processor options for the plugin to function correctly.
This is done by editing your results page configuration, and editing (or adding) the query_processor_options
key. Edit (or add) the -SF
value to include the originalUrl
field.
Configuration key name | Value |
---|---|
query_processor_options |
|
Tracking the original URL
The original URL of the page will be added as document metadata - adding two additional metadata fields: original-url and fb-original-url.
e.g.
<meta name="original-url" content="ORIGINAL-URL">
<meta name="fb-original-url" content="ORIGINAL-URL">
The meta tag fb-original-url
is added to the metadata class originalUrl
for use within results pages. If you need the original URL for any additional filters, the original-url
metadata field should be used.
Canonical URLs
To prevent the default behaviour of handling the canonical link in Funnelback, all the canonical links in the cloned document will be removed during the index phase.
Facilitating grouping of split items
It will often be desirable to be able to search your index as if the items were not split, to avoid showing duplicates in your search result - the main use case for this is for an events search where you might have a view of the search where you show the results by date of event (meaning duplicate event items in the results makes sense) but also wish to just have a search that retrieves a matching event showing all the dates that the event might be occurring on.
Ensure result collapsing signatures are generated for the original Url.
To achieve this you should configure result collapsing using the originalUrl
metadata class.
e.g. on the data source where you have configured the plugin add configuration key:
indexing.collapse_fields=[$],[originalUrl]
This will configure Funnelback to generate a result collapsing signature based on the originalUrl
field value.
Applying the result collapsing
In order to apply the result grouping, you then enable result collapsing using the [originalUrl]
collapsing signature on your results page. Add these two settings when you run a query that you wish to collapse, which will remove your duplicated (cloned) results.
e.g. as part of the configuration key query_processor_options
query_processor_options= -collapsing=on -collapsing_sig=[originalUrl]
or as URL query parameters
https://example-search.funnelback.squiz.cloud/s/search.html?collection=example&query=example&collapsing=on&collapsing_sig=[originalUrl]
See: Result collapsing
Filter chain configuration
This plugin uses filters which are used to apply transformations to the gathered content.
The filters run in sequence and need be set in an order that makes sense. The plugin supplied filter(s) (as indicated in the listing) should be re-ordered to an appropriate point in the sequence.
Changes to the filter order affects the way the data source processes gathered documents. See: document filters documentation. |
Examples
Clone event pages
In this example we have HTML event pages that contain metadata outlining the days that the event is running, with some events spanning multiple dates. We wish to create an events search, that shows events that run on specific dates.
Page with events in separate fields
Consider the following HTML page with the URL:
http://www.example.com/events/new-event
<html>
<head>
<title>Example Event Page</title>
<meta name="page-type" content="events" >
<meta name="event-dates" content="2023-01-01">
<meta name="event-dates" content="2023-01-21">
<meta name="internal-use" content="true">
...
</head>
...
</html>
We wish to clone this event page for each occurrence of the event-dates
metadata field,
add the new metadata field call collapsing
with the content
recurring-event
and remove metadata field internal-use
.
This can be achieved with the following configuration:
Configuration key name | Parameter 1 | Value |
---|---|---|
Choose how your document will be cloned |
Repeated fields |
|
Include selector |
|
|
Clone selector |
|
|
Add metadata field name |
|
|
Add metadata field content |
|
|
Remove selector |
|
|
Cloned URL suffix |
|
Additional data source configuration
Setting the following in your results page configuration will enable you to apply result collapsing to collapse the cloned items into a single result if you need to change the result listing.
indexing.collapse_fields=[$],[originalUrl]
Ensure you also configure your metadata mappings. For an events search you will normally wish to map the field containing the event date to either the d
(date) metadata class, or a numeric metadata class. If you are converting a legacy events search you will probably have the data mapped to the O
metadata class.
Page with events in a single field
Consider the following HTML page with the URL, as is common for event pages that were created for Funnelback’s legacy events mode:
http://www.example.com/events/new-event
<html>
<head>
<title>Example Event Page</title>
<meta name="page-type" content="events" >
<meta name="event-dates" content="20230101 | 20230121">
<meta name="internal-use" content="true">
...
</head>
...
</html>
We wish to clone this event page for each occurrence of the event-dates
metadata field. This can be achieved with the following configuration:
Configuration key name | Parameter 1 | Value |
---|---|---|
Choose how your document will be cloned |
Repeated values in a field |
|
Include selector |
|
|
Clone selector |
|
|
Multi-value field delimiter |
|
|
Remove selector |
|
|
Cloned URL suffix |
|
Cloned page output
The above configuration results in the following two HTML documents being included in the index.
http://www.example.com/events/new-event/fb-recurring-event/1
<html>
<head>
<title>Example event page</title>
<meta name="page-type" content="events" >
<meta name="event-dates" content="2023-01-01"> (1)
<meta name="collapsing" content="recurring-event">
<meta name="original-url" content="http://www.example.com/events/new-event">
<meta name="fb-original-url" content="http://www.example.com/events/new-event">
...
</head>
...
</html>
1 | The value shown here is for the first configuration, the second configuration will set the event-dates to 20230101 |
http://www.example.com/events/new-event/fb-recurring-event/2
<html>
<head>
<title>Example event page</title>
<meta name="page-type" content="events" >
<meta name="event-dates" content="2023-01-21"> (1)
<meta name="collapsing" content="recurring-event">
<meta name="original-url" content="http://www.example.com/events/new-event">
<meta name="fb-original-url" content="http://www.example.com/events/new-event">
...
</head>
...
</html>
1 | The value shown here is for the first configuration, the second configuration will set the event-dates to 20230101 |
When running a search the results page will return two records with a title of Example event page and a liveUrl
and displayUrl
set to http://www.example.com/events/new-event
.
Configure your events listing
The events search will work similarly to any other search with the following differences:
-
Because you have cloned your events your results will look like there are duplicates. For events listing you will normally want to sort by the date metadata field (descending) and have some logic in your template that inserts a heading containing the event date in your results listing. In Freemarker this would look something like:
<#assign curdate="0"> <@s.Results> <#if s.result.class.simpleName != "TierBar"> <#-- print a date heading if the date has changed--> <#if s.result.listMetadata["O"]?first != curdate> (1) <#assign curdate = s.result.listMetadata["O"]?first> <h3>Events occurring on ${curdate?date("yyyyMMdd")}</h3> </#if> <li data-fb-result="${s.result.indexUrl}" class="result<#if !s.result.documentVisibleToUser>-undisclosed</#if>">
1 In this example the event date is mapped to the O
metadata class. -
If you have a search of your events, or you are including your events in a general mixed search then you may wish to collapse your events so that there is only a single result returned for a specific event. This is accomplished by enabling results collapsing on that search. this can be done via a separate results page, or by adding additional request parameters to toggle on and apply the result collapsing signature.
If you are converting a legacy events search to use this plugin you will be able to match most of the functionality that you previously had except for complex queries where you returned events on a specific date, combined with date ranges where you were not specifying a >= type search (e.g. when you specified a search like music % O=20160415 O>20160415 O<20160630 O=20160505 O=20160510 ). However, mixed queries like this were very uncommon.
|