Result diversification

Introduction

There are a number of ranking options that are designed to increase the diversity of the result set. These options can be used to reduce the likelihood of result sets being flooded by results from the same website, data source etc.

Same site suppression

Each website has a unique information profile and some sites naturally rank better than others. Search engine optimisation (SEO) techniques assist with improving a website’s natural ranking.

Same site suppression can be used to downweight consecutive results from the same website resulting in a more diverse set of search results.

Same site suppression is configured by setting the following ranking options:

  • SSS: controls the depth of comparison (in the URL) used to determining what a site is. This corresponds to the depth of the URL (or the number of sub-folders in a URL). Note: SSS=10 is a special case and suppresses by organisation.

    • Range: 0-1000

    • SSS=0 - no suppression (default)

    • SSS=2 - site name + first level folder

    • SSS=10 - suppress by organisation domain - this attempts to suppress based on the part of the domain that is likely to be controlled by a single entity such as a company or government agency. e.g. whitehouse.gov, defence.gov.au, acme.co.uk.

  • SameSiteSuppressionExponent: Controls the downweight penalty applied. Larger values result in greater downweight.

    • Range: 0.0 - unlimited (default = 0.5)

    • Recommended value: between 0.2 and 0.7

  • SameSiteSuppressionOffset: Controls how many documents are displayed beyond the first document from the same site before any downweight is applied.

    • Range: 0-1000 (default = 0)

    • sss_defeat_pattern: URLs matching the simple string pattern are excluded from same site suppression.

Example:

query_processor_options= -SSS=3 -SameSiteSuppressionExponent=0.6 -SameSiteSuppressionOffset=2 -sss_defeat_pattern=Media

Same meta suppression

Downweights subsequent results that contain the same value in a specified metadata field. Same meta suppression is controlled by the following ranking options:

  • same_meta_suppression: Controls the downweight penalty applied for consecutive documents that have the same metadata field value.

    • Range: 0.0-1.0 (default = 0.0)

    • meta_suppression_field: Controls the metadata field used for the comparison. Note: only a single metadata field can be specified.

Example:

query_processor_options= -same_meta_suppression=0.7 -meta_suppression_field=subject

Same collection suppression

Downweights subsequent results that come from the same data source. This provides similar functionality to the search package data source component weighting above and could be used in conjunction with it to provide an increased influence. Same collection suppression is controlled by the following ranking options:

  • same_collection_suppression: Controls the downweight penalty applied for consecutive documents that reside in the same data source.

    • Range: 0.0-1.0 (default = 0.0)

Example:

	query_processor_options= -same_collection_suppression=0.45

Same title suppression

Downweights subsequent results that contain the same title. Same title suppression is controlled by the following ranking options:

  • title_dup_factor: Controls the downweight penalty applied for consecutive documents that have the same title value.

    • Range: 0.0-1.0 (default = 0.5)

Example:

query_processor_options= -title_dup_factor=0.63

Result collapsing

While not a ranking option, result collapsing can be used to effectively diversify the result set by grouping similar result items together into a single result.

Results are considered to be similar if:

  • They share near-identical content

  • The have identical values in one or a set of metadata fields.

Result collapsing requires a configuration that affects both the indexing and query time behaviour of Funnelback.