Plugin: Clean title

Purpose

Use this plugin to remove sections of text from your search result titles.

Search result titles often includes text, such as a common prefix or suffix that contains the site name. This clutters the search results display and can make it difficult for a user to quickly scan the search results.

This plugin allows you to clean up the title, by removing parts of the text. This is achieved by defining a series of regular expressions that identify parts of the title to remove.

This plugin supports two types of usage:

Cleaning the title in HTML source data
This method fixes the title before it has been indexed. This means the title will be changed for all features within Funnelback that refer to the title. This method will provide more consistent behavior, but requires an index. Also use this if you need to sort your search results alphabetically.

When enabled on a data source it can be used to clean the contents of the html <title> element.

This only applies to HTML documents and will update the value of the <title> element. The advantage of modifying the source data is that the title included in the index will contain the modification meaning that sorting will function correctly if the search result title is based on the html <title> value.

To clean the title within the source data, follow the steps for enabling the plugin on data source, below.

Cleaning the title returned in the search results listing
This method fixes the title after it has been indexed, and only applies the change to the JSON data returned when making a query. Use this method if you need to quickly apply a change and don’t need to sort your results alphabetically. The changes will only affect the titles that are printed in the search results listing.

When enabled on a results page it can be used to modify the value of the result.title data model element. Use the plugin on a results page if you just need to update the sear result titles (regardless of the underlying data source type).

If you modify the result.title using this method then sorting by title may be incorrect as the renaming occurs after the result set has been sorted. Sorting will be incorrect if you modify the start of any titles and the regex pattern does not match all search result titles.

To clean the title returned in the results listing, follow the steps for enabling the plugin on results page, below.

Usage

Enable the plugin

  1. Select Plugins from the side navigation pane and click on the Clean title tile.

  2. From the Location section, decide if you wish to enable this plugin on a data source or a results page and select the corresponding radio button.

  3. Select the data source or results page to which you would like to enable this plugin from the drop-down menu.

If enabled on a data source, the plugin will take effect as soon as the setup steps are completed, and an advanced > full update of the data source has completed. If enabled on a results page the plugin will take effect as soon as the setup steps are completed.

Configuration settings

The configuration settings section is where you do most of the configuration for your plugin. The settings enable you to control how the plugin behaves.

The configuration key names below are only used if you are configuring this plugin manually. The configuration keys are set in the data source or results page configuration to configure the plugin. When setting the keys manually you need to type in (or copy and paste) the key name and value.

Removal pattern (regex)

Configuration key

plugin.clean-title.config.regex.*

Data type

string

Required

This setting is required

This key defines a Java format regular expression pattern that is compared to the title. Any parts of the title that match this regular expression will be removed.

Applying multiple clean patterns

The removal pattern (plugin.clean-title.config.regex) option affects the titles returned within the data model’s result.title element.

This option can be defined multiple times by assigning a different identifier in the Parameter 1 field when configuring the setting. If multiple regex keys are defined then they will be executed in sequence with the order determined by the Parameter 1 value when sorted alphabetically.

For example, if you have three removal patterns defines with Parameter 1 IDs of orange, apple and pear, then the patterns will be applied in the following order:

  1. apple

  2. orange

  3. pear

Filter chain configuration

This plugin uses filters which are used to apply transformations to the gathered content.

The filters run in sequence and need be set in an order that makes sense. The plugin supplied filter(s) (as indicated in the listing) should be re-ordered to an appropriate point in the sequence.

Changes to the filter order affects the way the data source processes gathered documents. See: document filters documentation.

Jsoup filter classes

This plugin supplies a filter that needs to run in the HTML document (Jsoup) filter chain:`com.funnelback.plugin.cleantitle.CleanTitleFilter`

Drag the com.funnelback.plugin.cleantitle.CleanTitleFilter plugin filter to where you wish it to run in the filter chain sequence.

Examples

Clean a prefix and suffix from search result titles

This example applies for both data sources and results pages, as outlined above.

Consider we have titles like:

ExampleOrg - Page title (www.example.com)

Where many pages have titles that are prefixed with ExampleOrg - and contain a suffix of (www.example.com).

You would like the Page title to be displayed as the hyperlinked title in your search results.

This could be achieved by setting the following configuration keys:

Configuration key name Parameter 1 Value

plugin.clean-title.config.regex

generic-prefix

^ExampleOrg -\s+

plugin.clean-title.config.regex

generic-suffix

\s+\(www\.example\.com\)$

This runs each of the regexes on the result title or <title> element thus we remove both the prefix and suffix.

The generic-prefix and generic-suffix names could have been called anything, but remember that the names used will define the order in which the patterns are applied.

When viewing the raw configuration these keys will appear as

plugin.clean-title.config.regex.generic-prefix=^ExampleOrg -\s+
plugin.clean-title.config.regex.generic-suffix=\s+\(www\.example\.com\)$

Change log

[1.1.0]

Changed

  • Updated to the latest version plugin framework (Funnelback shared v16.20) to enable integration with the new plugin management dashboard.