Query independent evidence (QIE)

Introduction

Query independent evidence (QIE) is used to assign rank weightings to documents, without consideration of the user’s query. For example, documents from a particular website may be up-weighted, while documents of a particular filetype may be down-weighted. QIE is configured through the qie.cfg file.

QIE can be used for a variety of purposes. For example:

  • Externally computed page importance, such as PageRanks can be used to promote popular pages.

  • Weights can be associated with particular document formats e.g. 1.0 for HTML, 0.5 for PDF and 0 for PPT. This can be used to discourage (but not prevent) the return of certain types of document.

  • Weights associated with particular websites or part of sites can be used to bias results towards an important main site.

  • Spam scores (0 for spam and 1 for non-spam) can be used to bias against (but not prevent) the display of spam results.

QIE configuration files

QIE is configured by creating a qie.cfg on a data source.

The Configuration file manager can be used to create and edit this file. Lines in the configuration file have the following format:

qie.cfg
# comment line
qie_weight url_pattern

Where:

qie_weight

is a floating point number (within a range of 0.0-1.0), specifying the QIE weight to be applied. A value between 0.5 and 1.0 indicates an up-weight, a value between 0.0 and 0.5 indicates a down-weight.

The default value of QIE weight is a configurable value, set in the qie.default_weight configuration key. This weight is applied to any URLs that do match another pattern in the QIE configuration qie.cfg.

If qie.default_weight is not specified, it defaults to a neutral weighting of 0.5.

url_pattern

is a perl5 syntax regular expression to be matched against the document’s indexed URL.

comment-line

is an ignored line starting with a hash.

For example:

qie.cfg
# down-weight pages from all states except Western Australia
0.25  ^(https://)?[^/]*nsw.gov.au/
1.0   ^(https://)?[^/]*wa.gov.au/
0.25  ^(https://)?[^/]*sa.gov.au/
0.25  ^(https://)?[^/]*nt.gov.au/

Each indexed URL is matched against every URL pattern, stopping at the first match. If none match, a default score of 0.5 is applied.

PADRE strips "http:// " from URLs which start with that. Consequently, don’t include "http:// " in your URL patterns.

At indexing time if a data source has a qie.cfg file a supplemental index file will be produced that contains information on URLs to up-weight and down-weight.

QIE configuration is not automatically applied to all generations in a push data source. QIE configurations are applied to newly committed generations as well as merged generations. To re-apply QIE configurations to an entire Push collection you will need to trigger a vacuum via the API.

QIE default weight

The qie.default_weight setting can be used to set a default weighting for QIE, which is applied to any document that doesn’t match a URL pattern specified in the qie.cfg. This is set within the data source configuration.

If this value is not set, it defaults to 0.5 which is a neutral value. A value >0.5 indicates an upweight, a value <0.5 indicates a downweight.

Configuring Funnelback to use QIE

Funnelback must be configured to use the QIE by configuring the influence of QIE on the ranking algorithm. This is set by assigning an influence to the cool.4 ranking option.