Query independent evidence (QIE)
Introduction
Query independent evidence (QIE) is used to assign rank weightings to documents, without consideration of the user’s query. For example, documents from a particular website may be up-weighted, while documents of a particular filetype may be down-weighted. QIE is configured through the qie.cfg
file.
QIE can be used for a variety of purposes. For example:
-
Externally computed page popularities, such as PageRanks can be used to promote popular pages.
-
Weights can be associated with particular document formats — e.g. 1.0 for HTML, 0.5 for PDF and 0 for PPT. This can be used to discourage (but not prevent) the return of certain types of document.
-
Weights associated with particular websites or part of sites can be used to bias results towards an important main site.
-
Spam scores (0 for spam and 1 for non-spam) can be used to bias against (but not prevent) the display of spam results.
QIE configuration files
QIE configuration files are typically placed in:
$SEARCH_HOME/conf/<DATA-SOURCE-ID>/qie.cfg
The file-manager can be used to create and edit this file. Lines in the configuration file have the following format:
# comment line qie_weight url_pattern
Where:
qie_weight is a floating point number (assumed normalised to the range 0-1, specifying the qie score to be applied). A QIE weight of 0.5 is the default. A larger value indicates an upweight, a smaller value indicates a downweight.
url_pattern is a perl5 syntax regular expression to be matched against the document’s indexed URL.
comment-line is an ignored line starting with a hash.
For example:
# down-weight pages from all states except Western Australia 0.25 ^(https://)?[^/]*nsw.gov.au/ 1.0 ^(https://)?[^/]*wa.gov.au/ 0.25 ^(https://)?[^/]*sa.gov.au/ 0.25 ^(https://)?[^/]*nt.gov.au/
Each indexed URL is matched against every URL pattern, stopping at the first match. If none match, the default score passed to padre-qi is used.
PADRE strips "http:// " from URLs which start with that. Consequently, don’t include "http:// " in your URL patterns. |
At indexing time if a data source has a qie.cfg file a supplemental index file will be produced that contains information on URLs to upweight and downweight.
QIE configuration is not automatically applied to all generations in a push data source. QIE configurations are applied to newly committed generations as well as merged generations. To re-apply QIE configurations to an entire Push collection you will need to trigger a Vacuum. |