Manually adjust the ranking algorithm

Automated tuning should be used (where possible) to set ranking influences as manually altering influences can result in fixing of a specific problem at the expense of the rest of the content.

Ranking options

Funnelback’s ranking algorithm determines what results are retrieved from the index and how the order of relevance is determined.

The ranking of results is a complex problem, influenced by a multitude of document attributes. It’s not just about how many times a word appears within a document’s content.

  • Funnelback uses ranking options, which are a subset of the query processor options, to change the weightings within the ranking algorithm.

  • Ranking options are applied at query time - this means that different services and profiles can have different ranking settings applied, on an identical index. Ranking options can also be changed via CGI parameters at the time the query is submitted.

Setting ranking indicators

Funnelback has an extensive set of ranking parameters that influence how the ranking algorithm operates.

This allows for customization of the influence provided by 73 different ranking indicators.

The main ranking indicators are:

  • Content: This is controlled by the cool.0 parameter and is used to indicate the influence provided by the document’s content score.

  • On-site links: This is controlled by the cool.1 parameter and is used to indicate the influence provided by the links within the site. This considers the number and text of incoming links to the document from other pages within the same site.

  • Off-site links: This is controlled by the cool.2 parameter and is used to indicate the influence provided by the links outside the site. This considers the number and text of incoming links to the document from external sites in the index.

  • Length of URL: This is controlled by the cool.3 parameter and is used to indicate the influence provided by the length of the document’s URL. Shorter URLs generally indicate a more important page.

  • External evidence: This is controlled by the cool.4 parameter and is used to indicate the influence provided via external evidence (see query independent evidence below).

  • Recency: This is controlled by the cool.5 parameter and is used to indicate the influence provided by the age of the document. Newer documents are generally more important than older documents.

Applying ranking options

Ranking options are applied in one of three ways:

  • Set as a default for the results page by adding the ranking option to the query_processor_options parameter in the results page configuration.

  • Set at query time by adding the ranking option as a CGI parameter. This is a good method for testing but should be avoided in production unless the ranking factor needs to be dynamically set for each query, or set by a search form control such as a slider.

Many ranking options can be set simultaneously, with the ranking algorithm automatically normalising all of the supplied ranking factors. E.g.

query_processor_options=-stem=2 -cool.1=0.7 -cool.5=0.3 -cool.21=0.24

Automated tuning is the recommended way of setting these ranking parameters as it uses an optimization process to determine the optimal set of factors. Manual tuning can result in an overall poorer end result as improving one particular search might impact negatively on a lot of other searches.

Search package data sources weightings

When different data sources are combined using a search package it is often beneficial to weight the individual data sources differently. This can be for a number of reasons, the main ones being:

  • Some data sources are simply more important than others. E.g. a university’s main website is likely to be more important than a department’s website.

  • Some data source types naturally rank better than others. E.g. web data sources generally rank better than other data sources types as there is a significant amount of additional ranking information that can be inferred from attributes such as the number of incoming links, the text used in these links and page titles. XML and database data sources generally have few attributes beyond the record content that can be used to assist with ranking.

Relative data source weighting is controlled using the cool.21 parameter.

Click data

By default Funnelback will track which results are click on by a user for any query that is run.

This information can be utilised by Funnelback to improve ranking over time by learning from this recorded user behaviour.

Metadata weighting

It is often desirable to up (or down) weight a search result when search keywords appear in specified metadata fields. Funnelback provides ranking options to set individual metadata fields to consider and also relative weightings to apply.

Query independent evidence

Query independent evidence (QIE) allows certain pages or groups of pages within a website (based on a regular expression match to the document’s URL) to be upweighted or downweighted without any consideration of the query being run.