Query independent evidence (QIE)
Introduction
Query independent evidence (QIE) is used to assign rank weightings to documents, without consideration of the user’s query. For example, documents from a particular website may be up-weighted, while documents of a particular filetype may be down-weighted. QIE is configured through the qie.cfg
file.
QIE can be used for a variety of purposes. For example:
-
Externally computed page importance, such as PageRanks can be used to promote popular pages.
-
Weights can be associated with particular document formats e.g. 1.0 for HTML, 0.5 for PDF and 0 for PPT. This can be used to discourage (but not prevent) the return of certain types of document.
-
Weights associated with particular websites or part of sites can be used to bias results towards an important main site.
-
Spam scores (0 for spam and 1 for non-spam) can be used to bias against (but not prevent) the display of spam results.
QIE configuration files
QIE is configured by creating a qie.cfg
on a data source.
The Configuration file manager can be used to create and edit this file. Lines in the configuration file have the following format:
qie.cfg
# comment line
qie_weight url_pattern
Where:
- qie_weight
-
is a floating point number (within a range of 0.0-1.0), specifying the QIE weight to be applied. A value between 0.5 and 1.0 indicates an up-weight, a value between 0.0 and 0.5 indicates a down-weight.
The default value of QIE weight is a configurable value, set in the
qie.default_weight
configuration key. This weight is applied to any URLs that do match another pattern in the QIE configurationqie.cfg
.If
qie.default_weight
is not specified, it defaults to a neutral weighting of0.5
. - url_pattern
-
is a perl5 syntax regular expression to be matched against the document’s indexed URL.
- comment-line
-
is an ignored line starting with a hash.
For example:
qie.cfg
# down-weight pages from all states except Western Australia
0.25 ^(https://)?[^/]*nsw.gov.au/
1.0 ^(https://)?[^/]*wa.gov.au/
0.25 ^(https://)?[^/]*sa.gov.au/
0.25 ^(https://)?[^/]*nt.gov.au/
Each indexed URL is matched against every URL pattern, stopping at the first match. If none match, a default score of 0.5 is applied.
PADRE strips "http:// " from URLs which start with that. Consequently, don’t include "http:// " in your URL patterns. |
At indexing time if a data source has a qie.cfg
file a supplemental index file will be produced that contains information on URLs to up-weight and down-weight.
QIE configuration is not automatically applied to all generations in a push data source. QIE configurations are applied to newly committed generations as well as merged generations. To re-apply QIE configurations to an entire Push collection you will need to trigger a vacuum via the API. |
QIE default weight
The qie.default_weight
setting can be used to set a default weighting for QIE, which is applied to any document that doesn’t match a URL pattern specified in the qie.cfg
. This is set within the data source configuration.
If this value is not set, it defaults to 0.5 which is a neutral value. A value >0.5 indicates an upweight, a value <0.5 indicates a downweight.