Query independent evidence (QIE)

Query independent evidence (QIE) is a ranking option that can be used to mark certain pages, sections or types within the search index as more or less important, regardless of what a user searches for.

For example, documents from a particular main website may contain higher value information, while documents of a particular filetype are not preferred (e.g. if you have a html and pdf version of the same content).

QIE allows you to configure the search to mark this content either by how the document URLs are structured, or if the page is contained in a set of results returned by a search query.

When to use QIE

Use QIE:

  • to flag a certain page or group of pages within an index that are always more or less important when returned in the search results, regardless of what the query is.

  • to incorporate externally computed page importance, such as PageRanks to promote important pages.

Don’t use QIE:

  • to solve general problems with your site’s ranking, unless there are clear cases outlined above (use tools such as tuning for this).

Using QIE

There are a few steps that you need to complete to apply and use QIE with your search. A summary of the steps is below.

  1. On the relevant data source(s): Configure QIE by creating one or both of qie.cfg and query-qie.cfg from the data source file manager.

  2. Update the configuration files with the QIE rules.

  3. Update your data source(s) by running and advanced update to reindex the live view. This applies the QIE to your search index.

    if you are configuring QIE with a push data source you will need to re-index the full data source by executing a vacuum API call on the push data source.
  4. (optional) Configure a default QIE weight.

  5. Turn on QIE inside the ranking algorithm. This is done by setting a value for the cool.4 parameter, which determines how much influence the QIE has within the overall ranking algorithm.

QIE configuration files

There are two different sources of QIE configuration.

Applying QIE based on the document’s URL

Use the qie.cfg configuration file to define a set of QIE weights that are based on a perl5 regular expression match to the document’s indexed URL. Use this if you can identify the documents you need to up or down weight by common things in their URL (e.g. all items in a publications/ folder should be up-weighted).

See: qie.cfg for information on how to configure a qie.cfg file.

Applying QIE to the set of documents returned by a query

Use the query-qie.cfg configuration file to define a set of QIE weights that are applied to a set of results returned in a specified search query. Use this if you can identify the documents by running a query (e.g. all items in that have type=publications set in their metadata).

See: query-qie.cfg for information on how to configure a query-qie.cfg file.

the query here is only used to identify the set of documents that have the QIE weight applied. The weightings, once applied, are completely independent of any user query that is being run.

Setting the QIE default weight

The qie.default_weight setting can be used to set a default weighting for QIE, which is applied to any document that doesn’t match a URL pattern specified in the qie.cfg, or is not returned by a query defined in query-qie.cfg. This is set within the data source configuration.

If this value is not set, it defaults to 0.5 which is a neutral value. A value >0.5 indicates an upweight, a value <0.5 indicates a downweight.

Checking to see if QIE has been applied

QIE is applied as part of the indexing phase and messages for this are logged to the Step-QIEUpdate.log, found in your data source logs.

Step-QIEUpdate.log
    Running command: /opt/funnelback/bin/padre-sw (1)
    With arguments: /opt/funnelback/data/example~ds-example/one/idx_reindex/index -res=qiecfg -qieval=0.65
    Command will read from STDIN
    Environment: {SEARCH_HOME=/opt/funnelback, TEMP=/tmp/1691554466774-0, LD_LIBRARY_PATH=/opt/funnelback/bin, TMP=/tmp/1691554466774-0, TMPDIR=/tmp/1691554466774-0}
    Log (STDOUT) output: /opt/funnelback/data/example~ds-example/live/tmp/query.qie.q.0.65_20230809_04_14_26
    Log (STDERR) output: /opt/funnelback/data/example~ds-example/live/log/Step-QIEUpdate.log
####################################################################################################

Command finished with exit code: 0
    Running command: /opt/funnelback/bin/padre-qi (2)
    With arguments: /opt/funnelback/data/example~ds-example/one/idx_reindex/index /opt/funnelback/data/example~ds-example/live/tmp/query.qie.q._raw_20230809 0.5
    Command will not read from STDIN
    Environment: {TEMP=/tmp/1691554083518-0, EXECUTABLE_DIR=/opt/funnelback/bin, LD_LIBRARY_PATH=/opt/funnelback/bin, TMP=/tmp/1691554083518-0, TMPDIR=/tmp/1691554083518-0}
    Log output: /opt/funnelback/data/example~ds-example/live/log/Step-QIEUpdate.log
####################################################################################################

Pattern[0] (0.250000) www.example.com/reviews/ (3)
Pattern[1] (1.000000) www.example.com/episodes/
Pattern[2] (0.650000) https://www.example.com\/reviews\/example31\.html
Pattern[3] (0.650000) https://www.example.com\/reviews\/example553\.html
Pattern[4] (0.650000) https://www.example.com\/reviews\/example12\.html
Pattern[5] (0.650000) https://www.example.com\/reviews\/example75524\.html
---------------- Summary ------------ (4)
No. docs: 911
No. with dflt score set: 46
-------------------------------------------
Command finished with exit code: 0 (5)
1 A padre-sw command is run for each query defined in a query-qie.cfg file, which will result in a set of log messages similar to this. This shows Funnelback executing the query to get a list of matching URLs, with a successful exit status (exit code 0).
2 A single padre-qi command is run, which applies the compiled set of QIE patterns to the search index. The set of patterns is compiled by combining the rules from qie.cfg with all the URLs returned from the searches.
3 This shows the combined set of rules, read from qie.cfg with the sets of returned URLs appended as patterns. Each Pattern line should correspond to a rule within the qie.cfg, or one of the URLs returned by a query from query-qie.cfg. If the query returns 4 results there will be 4 patterns added to the pattern list, with the weighting that was specified in the query rule.
4 Shows that the index has 911 documents and of these, 46 were assigned the QIE default value. This means that they didn’t match any rule defined in the qie.cfg or query-qie.cfg, and the other 865 documents in the index were assigned QIE weights based on matches to patterns that were defined in the configuration.
5 Shows that the application of QIE finished successfully.

Testing the search results

Once the indexing process is complete, you can test the effect of QIE on your search results by running a search and then editing your URL to include an additional parameter.

e.g. If your search URL is https://search.example.com/s/search.html?collection=example&query=test

Add the following parameter cool.4=1.0 - this turns on QIE and applies the maximum influence of the QIE weights within the index. i.e. https://search.example.com/s/search.html?collection=example&query=test&cool.4=1.0. If the results contain items that have QIE weights set then the search results order should change, confirming that QIE is working.

Edit the URL again and set cool.4=0 - this turns off QIE. i.e. https://search.example.com/s/search.html?collection=example&query=test&cool.4=0. If the results contain items that have QIE weights set then the search results order should change.

Configuring the search to use QIE

Funnelback must be configured to use the QIE by configuring the influence of QIE on the ranking algorithm. This is set by assigning an influence to the cool.4 ranking option.

Using the method outlined above in Testing the search results, adjust the value of the cool.4 parameter to values between 0.0 and 1.0.

Do this for a sample of queries to observe the effect on the results order until you find a suitable cool.4 value.

Once you have determined an appropriate value set the cool.4 parameter in the query_processor_options of your results page configuration then save and publish the configuration.

e.g.

query_processor_options

-stem=2 -SF=[title,author] -cool.4=0.74

Tutorial

The tutorial below outlines how to set up QIE based on URL patterns.

Tutorial: Query independent evidence

  1. Log in to the search dashboard and switch to the simpsons - website data source.

  2. Click the manage data source configuration files item from the settings panel and create a qie.cfg file

  3. Add the following URL weightings to the qie.cfg and then save the file: 0.25 provides a moderate down-weight, while 1.0 is the maximum up-weight that can be provided via QIE. Items default to having a weight of 0.5.

    # down-weight reviews and upweight episodes
    0.25 www.simpsoncrazy.com/reviews/
    1.0  www.simpsoncrazy.com/episodes/
  4. Re-index the data source ( From the update panel: start advanced update  re-index live view.

  5. Switch to the simpsons results page, then run a query for homer adding cool.4=0 and then cool.4=1.0 to the URL to observe the effect of QIE when it has no influence and when it has the maximum influence. Applying the maximum influence from QIE pushes episode pages to the top of the results (and this is despite the default same site suppression being applied).

    exercise query independent evidence 01
    exercise query independent evidence 02
  6. Like other ranking settings you can set this in the results page configuration once you have found an appropriate influence to apply to your QIE.