Configuring the search result order

Search result sorting

A number of sort options are available to control the order of search results.

By default, search results are sorted by relevance - which is determined by the ranking algorithm.

This can be changed to sort on various fields (such as by title, date, or a specified metadata field).

Improving your search ranking without altering the weightings in the ranking algorithm

There are many factors that affect the quality of ranking for a search which do not involve adjustments to the weightings used by Funnelback’s ranking algorithm. This guide provides a general overview of what should be considered when attempting to improve search result rankings.

Ranking options

Funnelback’s ranking algorithm determines what results are retrieved from the index and what how the order of relevance is determined.

The ranking of results is a complex problem, influenced by a multitude of document attributes. It’s not just about how many times a word appears within a document’s content.

  • Ranking options are a subset of the query processor options which also control other aspects of query time behaviour (such as display settings).

  • Ranking options are applied at query time - this means that different services and profiles can have different ranking settings applied, on an identical index. Ranking options can also be changed via CGI parameters at the time the query is submitted.

Automated tuning

Tuning is a process that can be used to determine which attributes of a document are indicative of relevance and adjust the ranking algorithm to match these attributes.

Tuning requires the specification of a set of queries and best answers that are uses as a training data set to optimise the ranking algorithm.

Setting ranking indicators

Funnelback has an extensive set of ranking parameters that influence how the ranking algorithm operates.

This allows for customisation of the influence provided by 73 different ranking indicators.

Automated tuning should be used (where possible) to set ranking influences as manually altering influences can result in fixing of a specific problem at the expense of the rest of the content.

The main ranking indicators are:

  • Content: This is controlled by the cool.0 parameter and is used to indicate the influence provided by the document’s content score.

  • On-site links: This is controlled by the cool.1 parameter and is used to indicate the influence provided by the links within the site. This considers the number and text of incoming links to the document from other pages within the same site.

  • Off-site links: This is controlled by the cool.2 parameter and is used to indicate the influence provided by the links outside the site. This considers the number and text of incoming links to the document from external sites in the index.

  • Length of URL: This is controlled by the cool.3 parameter and is used to indicate the influence provided by the length of the document’s URL. Shorter URLs generally indicate a more important page.

  • External evidence: This is controlled by the cool.4 parameter and is used to indicate the influence provided via external evidence (see query independent evidence below).

  • Recency: This is controlled by the cool.5 parameter and is used to indicate the influence provided by the age of the document. Newer documents are generally more important than older documents.

Applying ranking options

Ranking options are applied in one of three ways:

  • Set as a default for the results page by adding the ranking option to the query_processor_options parameter in the results page configuration.

  • Set at query time by adding the ranking option as a CGI parameter. This is a good method for testing but should be avoided in production unless the ranking factor needs to be dynamically set for each query, or set by a search form control such as a slider.

Many ranking options can be set simultaneously, with the ranking algorithm automatically normalising all of the supplied ranking factors. E.g.

query_processor_options=-stem=2 -cool.1=0.7 -cool.5=0.3 -cool.21=0.24

Automated tuning is the recommended way of setting these ranking parameters as it uses an optimisation process to determine the optimal set of factors. Manual tuning can result in an overall poorer end result as improving one particular search might impact negatively on a lot of other searches.

Search package data sources weightings

When different data sources are combined using a search package it is often beneficial to weight the individual data sources differently. This can be for a number of reasons, the main ones being:

  • Some data sources are simply more important than others. E.g. a university’s main website is likely to be more important than a department’s website.

  • Some data source types naturally rank better than others. E.g. web data sources generally rank better than other data sources types as there is a significant amount of additional ranking information that can be inferred from attributes such as the number of incoming links, the text used in these links and page titles. XML and database data sources generally have few attributes beyond the record content that can be used to assist with ranking.

Relative data source weighting is controlled using the cool.21 parameter.

Click data

By default Funnelback will track which results are click on by a user for any query that is run.

This information can be utilised by Funnelback to improve ranking over time by learning from this recorded user behaviour.

Result diversification and collapsing

There are a number of ranking options that are designed to increase the diversity of the result set. These options can be used to reduce the likelihood of result sets being flooded by results from the same website, data source etc.

Result collapsing can also be used to group together consecutive similar results.

Metadata weighting

It is often desirable to up (or down) weight a search result when search keywords appear in specified metadata fields. Funnelback provides ranking options to set individual metadata fields to consider and also relative weightings to apply.

Query independent evidence

Query independent evidence (QIE) allows certain pages or groups of pages within a website (based on a regular expression match to the document’s URL) to be upweighted or downweighted without any consideration of the query being run.

Troubleshooting ranking

Funnelback’s SEO auditor tool can be used to investigate ranking for specific queries and URLs, and provides advice on how to improve the ranking of the document.