Using the Funnelback index to generate gscope and kill configuration files

Background

Sometimes it is necessary to use a workflow command to generate a gscopes, QIE or kill configuration file from the Funnelback search index.

Generating a gscopes.cfg

  1. Run a padre query with the following query processor options:

    • -res=gscopes (or -res=docnums): tells padre to return results as gscope.cfg compatible text. -res=gscopes returns matching items with URLs. -res=docnums returns matching items as document numbers. Using -res=docnums is safer because there is a 10000 item limit for padre-gs when using a gscopes.cfg file that contains URLs.

    • -gscoperesult=: tells padre the gscope number to use in the result return (default value is to return the documents with a gscope of 1).

      e.g.

      $ QUERY_STRING='collection=mycollection&query=myquery&view=offline' $SEARCH_HOME/bin/padre-sw -res=docnums -gscoperesult=4 > $SEARCH_HOME/conf/mycollection/mygscopes.cfg

      The output is written to mygscopes.cfg and is a valid gscopes.cfg

  2. Apply the gscopes

    To apply the gscopes to the index run the following command.

    $ $SEARCH_HOME/bin/padre-gs $SEARCH_HOME/data/mycollection/offline/idx/index $SEARCH_HOME/conf/mycollection/mygscopes.cfg -docnum

    Omit -docnum if you’ve used -res=gscopes to generate the configuration file.

Generating a kill_exact.cfg

  1. Run a padre query with the following query processor options:

    • -res=flcfg: tells padre to return results as kill_exact.cfg compatible text.

      e.g.

      $ QUERY_STRING='collection=mycollection&query=myquery&view=offline' $SEARCH_HOME/bin/padre-sw -res=flcfg > $SEARCH_HOME/conf/mycollection/kill_list.cfg

      The output is written to kill_list.cfg and is a valid kill_exact.cfg

  2. Apply the kill

    To apply the kill to the index run the following command.

    $ $SEARCH_HOME/bin/padre-fl $SEARCH_HOME/data/mycollection/offline/idx/index $SEARCH_HOME/conf/mycollection/kill_list.cfg -exactmatch -kill

Generating a qie.cfg

  1. Run a padre query with the following query processor options:

    • -res=qiecfg: tells padre to return results as qie.cfg compatible text.

    • -qieval=: tells padre the weight to return with each URL

      eg.

      $ QUERY_STRING='collection=mycollection&query=myquery&view=offline' $SEARCH_HOME/bin/padre-sw -res=qiecfg > $SEARCH_HOME/conf/mycollection/myqie.cfg

      The output is written to myqie.cfg and is a valid qie.cfg

  2. Apply the QIE

    To apply the query independent evidence to the index run the following command.

    $ $SEARCH_HOME/bin/padre-qi $SEARCH_HOME/data/mycollection/offline/idx/index $SEARCH_HOME/conf/mycollection/myqie.cfg

Note: Funnelback 14.2 and earlier

In 14.2 and earlier padre-sw contains a bug that causes html encoding to occur in the URLs that are printed.

When using this method to generate configuration it may be necessary to sanitise the output (with something like sed) to fix the URLs.

A command similar to the following can be used to clean the output:

$ cat padreswgenerated.cfg | sed -e 's/amp;//g' -e 's/<!--.*\?-->//g' -e '/^$/d' > cleanedconfig.cfg

This command removes the html encoded ampersands and also lines containing html comments and empty lines.

Alternate method for generating gscope/kill/qie configuration files

This method involves running a query against the Funnelback index and using a custom template to return the configuration file.

This method is required for older versions of Funnelback that don’t support the additional padre-sw result modes. This method also works more reliably under Windows (Note: requires Cygwin for the curl command).

  1. Create a custom template (qie.ftl) containing the following code:

    <#ftl encoding="utf-8" />
    <#import "/web/templates/modernui/funnelback_classic.ftl" as s/>
    <#import "/web/templates/modernui/funnelback.ftl" as fb/>
    <@s.Results>
    <#if s.result.class.simpleName != "TierBar">
    <#compress><#if question.inputParameterMap["wt"]?exists>${question.inputParameterMap["wt"]?html} </#if>${s.result.liveUrl}</#compress>
    </#if>
    </@s.Results>

    The template checks for an option custom CGI parameter (wt) that contains the weighting to apply for each line. This can contain the QIE weight to assign, or gscope ID to return.

  2. Create post index workflow to run a curl command that saves the configuration file. Set appropriate values for num_ranks, query, collection and wt. wt is not required if you are generating a kill_exact.cfg.

    $ curl --connect-timeout 60 --retry 3 --retry-delay 20 "http://127.0.0.1/s/search.html?query=QUERY&num_ranks=LARGE_VALUE&wt=WT&view=offline&collection=COLLECTION&form=qie&profile=_default_preview" -o $SEARCH_HOME/conf/$COLLECTION_NAME/CONFIG.cfg || exit 1

    Repeat this command for each different gscope number/QIE weight you require.