Diversify your search results

There are a number of ranking options that are designed to increase the diversity of the result set. These result diversification options can be used to reduce the likelihood of result sets being flooded by results from the same website, data source etc.

There are some limitations that apply to the different diversity modes if you attempt to use them concurrently.

If either Same title suppression or Near-duplicate (very similar) title suppression are enabled and affect the ranking of a document, then diversification processing stops and the following diversification modes are not applied:

If Same title suppression or Near-duplicate (very similar) title suppression and both enabled and triggered, only the larger of the two penalties will be applied.

Result diversification is automatically disabled if you enable result collapsing.

Same site suppression

Each website has a unique information profile and some sites naturally rank better than others. Search engine optimization (SEO) techniques assist with improving a website’s natural ranking.

Same site suppression can be used to downweight consecutive results from the same website resulting in a more diverse set of search results.

Same site suppression is configured by setting the following ranking options:

  • SSS: controls the depth of comparison (in the URL) used to determining what a site is. This corresponds to the depth of the URL (or the number of sub-folders in a URL). Note: SSS=10 is a special case and suppresses by organization.

    • Range: 0-1000

    • SSS=0 - no suppression (default)

    • SSS=2 - site name + first level folder

    • SSS=10 - suppress by organization domain - this attempts to suppress based on the part of the domain that is likely to be controlled by a single entity such as a company or government agency. e.g. whitehouse.gov, defence.gov.au, acme.co.uk.

  • SameSiteSuppressionExponent: Controls the downweight penalty applied. Larger values result in greater downweight.

    • Range: 0.0 - unlimited (default = 0.5)

    • Recommended value: between 0.2 and 0.7

  • SameSiteSuppressionOffset: Controls how many documents are displayed beyond the first document from the same site before any downweight is applied.

    • Range: 0-1000 (default = 0)

    • sss_defeat_pattern: URLs matching the simple string pattern are excluded from same site suppression.

Example:

query_processor_options= -SSS=3 -SameSiteSuppressionExponent=0.6 -SameSiteSuppressionOffset=2 -sss_defeat_pattern=Media

Same meta suppression

Downweights subsequent results that contain the same value in a specified metadata field. Same meta suppression is controlled by the following ranking options:

  • same_meta_suppression: Controls the downweight penalty applied for consecutive documents that have the same metadata field value.

    • Range: 0.0-1.0 (default = 0.0)

    • meta_suppression_field: Controls the metadata field used for the comparison. Note: only a single metadata field can be specified.

Example:

query_processor_options= -same_meta_suppression=0.7 -meta_suppression_field=subject

Same collection suppression

Downweights subsequent results that come from the same data source. This provides similar functionality to the search package data source component weighting above and could be used in conjunction with it to provide an increased influence. Same collection suppression is controlled by the following ranking options:

  • same_collection_suppression: Controls the downweight penalty applied for consecutive documents that reside in the same data source.

    • Range: 0.0-1.0 (default = 0.0)

Example:

	query_processor_options= -same_collection_suppression=0.45

Same title suppression

Downweights subsequent results that contain the same title. Same title suppression is controlled by the following ranking options:

  • title_dup_factor: Controls the downweight penalty applied for consecutive documents that have the same title value.

    • Range: 0.0-1.0 (default = 0.5)

      Setting this value to 1.0 has the effect of disabling same title suppression.

Example:

query_processor_options= -title_dup_factor=0.63

Near-duplicate (very similar) title suppression

Downweights subsequent results that contain near-duplicate (very similar) title. Near-duplicate title suppression is controlled by the following ranking options:

  • near_dup_factor: Controls the downweight penalty applied for consecutive documents that have a near-duplicate title.

    • Range: 0.0-1.0 (default = 0.5)

      Setting this value to 1.0 has the effect of disabling near-duplicate title suppression.

Example:

query_processor_options= -near_dup_factor=0.63

Result collapsing

Result diversification is automatically disabled if you enable result collapsing.

While not a ranking option, result collapsing can be used to effectively diversify the result set by grouping similar result items together into a single result.

Results are considered to be similar if:

  • They share near-identical content

  • They have identical values in one or a set of metadata fields.

Result collapsing requires a configuration that affects both the indexing and query time behaviour of Funnelback.