Using the Funnelback index to generate gscope and kill configuration files
Background
Sometimes it is necessary to use a workflow command to generate a gscopes, QIE or kill configuration file from the Funnelback search index.
Generating a gscopes.cfg
-
Run a padre query with the following query processor options:
-
-res=gscopes
(or-res=docnums
): tells padre to return results as gscope.cfg compatible text. -res=gscopes returns matching items with URLs. -res=docnums returns matching items as document numbers. Using -res=docnums is safer because there is a 10000 item limit for padre-gs when using a gscopes.cfg file that contains URLs. -
-gscoperesult=
: tells padre the gscope number to use in the result return (default value is to return the documents with a gscope of 1).e.g.
$ QUERY_STRING='collection=mycollection&query=myquery&view=offline' $SEARCH_HOME/bin/padre-sw -res=docnums -gscoperesult=4 > $SEARCH_HOME/conf/mycollection/mygscopes.cfg
The output is written to
mygscopes.cfg
and is a validgscopes.cfg
-
-
Apply the gscopes
To apply the gscopes to the index run the following command.
$ $SEARCH_HOME/bin/padre-gs $SEARCH_HOME/data/mycollection/offline/idx/index $SEARCH_HOME/conf/mycollection/mygscopes.cfg -docnum
Omit
-docnum
if you’ve used-res=gscopes
to generate the configuration file.
Generating a kill_exact.cfg
-
Run a padre query with the following query processor options:
-
-res=flcfg
: tells padre to return results as kill_exact.cfg compatible text.e.g.
$ QUERY_STRING='collection=mycollection&query=myquery&view=offline' $SEARCH_HOME/bin/padre-sw -res=flcfg > $SEARCH_HOME/conf/mycollection/kill_list.cfg
The output is written to kill_list.cfg and is a valid
kill_exact.cfg
-
-
Apply the kill
To apply the kill to the index run the following command.
$ $SEARCH_HOME/bin/padre-fl $SEARCH_HOME/data/mycollection/offline/idx/index $SEARCH_HOME/conf/mycollection/kill_list.cfg -exactmatch -kill
Generating a qie.cfg
-
Run a padre query with the following query processor options:
-
-res=qiecfg
: tells padre to return results asqie.cfg
compatible text. -
-qieval=
: tells padre the weight to return with each URLeg.
$ QUERY_STRING='collection=mycollection&query=myquery&view=offline' $SEARCH_HOME/bin/padre-sw -res=qiecfg > $SEARCH_HOME/conf/mycollection/myqie.cfg
The output is written to myqie.cfg and is a valid
qie.cfg
-
-
Apply the QIE
To apply the query independent evidence to the index run the following command.
$ $SEARCH_HOME/bin/padre-qi $SEARCH_HOME/data/mycollection/offline/idx/index $SEARCH_HOME/conf/mycollection/myqie.cfg
Note: Funnelback 14.2 and earlier
In 14.2 and earlier padre-sw
contains a bug that causes html encoding to occur in the URLs that are printed.
When using this method to generate configuration it may be necessary to sanitise the output (with something like sed) to fix the URLs.
A command similar to the following can be used to clean the output:
$ cat padreswgenerated.cfg | sed -e 's/amp;//g' -e 's/<!--.*\?-->//g' -e '/^$/d' > cleanedconfig.cfg
This command removes the html encoded ampersands and also lines containing html comments and empty lines.
Alternate method for generating gscope/kill/qie configuration files
This method involves running a query against the Funnelback index and using a custom template to return the configuration file.
This method is required for older versions of Funnelback that don’t support the additional padre-sw
result modes. This method also works more reliably under Windows (Note: requires Cygwin for the curl command).
-
Create a custom template (
qie.ftl
) containing the following code:<#ftl encoding="utf-8" /> <#import "/web/templates/modernui/funnelback_classic.ftl" as s/> <#import "/web/templates/modernui/funnelback.ftl" as fb/> <@s.Results> <#if s.result.class.simpleName != "TierBar"> <#compress><#if question.inputParameterMap["wt"]?exists>${question.inputParameterMap["wt"]?html} </#if>${s.result.liveUrl}</#compress> </#if> </@s.Results>
The template checks for an option custom CGI parameter (wt) that contains the weighting to apply for each line. This can contain the QIE weight to assign, or gscope ID to return.
-
Create post index workflow to run a curl command that saves the configuration file. Set appropriate values for
num_ranks
,query
,collection
andwt
.wt
is not required if you are generating akill_exact.cfg
.$ curl --connect-timeout 60 --retry 3 --retry-delay 20 "http://127.0.0.1/s/search.html?query=QUERY&num_ranks=LARGE_VALUE&wt=WT&view=offline&collection=COLLECTION&form=qie&profile=_default_preview" -o $SEARCH_HOME/conf/$COLLECTION_NAME/CONFIG.cfg || exit 1
Repeat this command for each different gscope number/QIE weight you require.