Funnelback logo

Documentation

CATEGORY

Query processor options (collection.cfg)

Description

This option specifies additional configuration options that can be supplied to the query processor when performing queries. The PArallel Document Retrieval Engine (PADRE) query processor is a powerful engine that can be finely controlled through a large list of options that can be given to it. Often these options can be specified in this collection configuration parameter, or as a CGI parameter passed with the search request URL. The list of options available is given here, along with their CGI counterparts.

Caveats

  • Query processing will not occur if the query processor is given an invalid option.
  • Query processor options can affect Funnelback's speed and result quality, so change them with caution.
  • Numerical_Metadata search is currently only accessible using CGI parameters and not as query processor options.

A. Getting information about PADRE and its operation.

Command Line Values CGI Parameter Values Description
-V - - - Print version number and exit.
-ixform - - - Print index format version expected by this query processor and exit.
-help - - - Print full list of usage options.
-deb<n> Doesn't matter debug debug=<n> or debug Set debugging level to n
-showtimes - - - Print elapsed times at each stage of query processing.

B. Controlling how queries are interpreted.

Command Line Values CGI Parameter Values Description
-stem0 - stem 0 Disable query stemming
-stem1 - stem 1 Assume that query words are stems.
-stem2 - stem 2 Stem most query words.
-stem3 - stem 3 Stem all query words.
-STOP<f> <filename> - - Use the stoplist specified in file <f> (one word per line). Note that this replaces the default stop word list.
-ras - ras ras (just by itself with no value) Remove all stopwords from the query. (Rather than only eliminating stopwords if there are at least two non-stop words)
-loose<w> <number> (min = 0, default = 0) - - Phrase looseness in words
-prox<w> <number> (min = 0, default = 15) prox <number> Proximity limit in words
-case - case on Query processing should be case sensitive (No effect unless indexes built with -case.)
-maxt<t> <number> - - Process no more than t query terms. Query optimisation. Least common terms are processed first.
-cjkt - - - Activates support for searching over CJKT (Chinese, Japanese, Korean, Thai) characters in bigram mode. Implies DAAT mode and strips many operators from the queries.
- - enc <string> (default = "utf-8") The encoding parameter specifies the character encoding used in the query string, so that the query can be correctly converted to UTF-8 internally.
-nulqok - - - When an empty is query submitted via CGI, process it as a null query, returning documents in an order determined by their query independent scores.
-qsup= <settings> (default = no supplementary queries) qsup <settings> Determines which sources of supplementary queries are tried and the weights attached to them. E.g. -qsup=SPEL/0.9+USUK/0.4+RELQ/0.3+SYNS/0
Available sources are :
  • SPEL (spelling suggestions)
  • USUK (table of spelling differences between US english and UK english
  • RELQ (related queries)
  • SYNS (synonyms as defined by the blend.cfg file)
  • C. Controlling how documents are ranked.

    Command Line Values CGI Parameter Values Description
    -sco<d>[<classes>] 0 - score using doc text only, 2 - score using anchortext and URLs as well. upweight titles. sco <number> 1-7 Set doc scoring mode to d.
    -specf <metadata classes> (default=kK) specf <metadata classes> Fields listed in string s will be scored specially.
    -daat<n> <result count> (default=5000) daat <result count> Use Document-At-A-Time rather than Term-At-A-Time processing, which is more efficient but will limit processing to n fully matching results.
    -k1 <f> <number> (default= 2.0) k1 <number> Set text matching (Okapi) constants "k1" value.
    -b <f> <number> (default= 0.75) b <number> Set text matching (Okapi) constant "b" value.
    -wmeta <x> <weight> <metadata class (letter or digit)> <float> (default k=0.5, K=0.5, everything else=1) wmeta_<x> where <x> is a metadata class <float> Sets the weight for the metadata class <x> to the specified weight.
    -cool <content> <onsite> <offsite> <length_of_url> <number> <number> <number> <number> cool, cool0, cool1, cool2, cool3 (seperate parameters) 1, <integer>, <integer>, <integer>, <integer> (respectively) Used to adjust scoring parameters used for ranking documents. It is useful for tuning results.
    -cooler <content> <onsite> <offsite> <length_of_url> <external_qie> <recency> <number> <number> <number> <number> <number> <number> cool, cool0, cool1, cool2, cool3, cool4, cool5 (seperate parameters) <0 or 1>off or on, <integer>, <integer>, <integer>, <integer>, <integer>, <integer> (respectively) Used to adjust scoring parameters used for ranking documents. It is useful for tuning results.
    -cool<n> <f> <integer> <float> cool<n> <percentage> Set the value of the n-th cool parameter to f (or percentage). cool0 to cool5 are defined above: cool6 is URL attractiveness (Homepages favoured. Copyright pages and URLS with lots of punctuation deprecated.); cool10 upweights non-binary documents; cool12 upweights implicit phrases (DAAT mode only); cool20 biases in favour of documents from principal servers (e.g. abc.net.au, www.abc.net.au; cool21 applies primary collection weights specified in index.sdinfo file.
    -kmod 0 = normal, 1 = AF1 (Anchor Formula 1) (default=1) kmod <integer> Select special scoring function i for special fields.
    -nocool - cool 0 Turn off topic distillation scoring (cool and cooler).
    -neardup <float> (default=0.5) neardup <float> (default=0.5) Score multiplier that is given to documents that are detected to be near duplicates. (Setting neardup1 (CGI version neardup=1) effectively switches off near duplicate detection)
    -unknown_daysold<days> <number> (default=366) unknown_daysold <number> (default=366) A doc with unknown date is assumed to be d days old (for recency calculations).
    -reference_date<date> <yyyymmdd> (default=date of most recent document) reference_date <yyyymmdd> (default=date of most recent document) If specified, recency is based on this date rather than that of most recent doc.
    -recency_decay_vals <float (day)> <float (week)> <float (month)> <float (year)> <float (decade)> <float (century)> <float (millennia)> (default=1.0, 0.75, 0.5, 0.25, 0.025, 0.0025) Define how recency scores decay with time. Values should be in the range 1.0 to 0.0 and define the score given to documents with ages of 0 days, 1 week, 1 month, 1 year, 1 decade, 1 century, 1 millenium respectively. Recency scores between these dates are linearly interpolated, and beyond 1 millenium scores are calculated as (1/age in days).
    -SSS<n> <integer> (default=2) SSS <integer> Same site suppression depth.
    -SameSiteSuppressionExponent <n> <float> (default=0.5) SameSiteSuppressionExponent <float> The degree to which a same site penalty affects a documents score. Successive documents from the same site will have their score divided by their rank raised to this exponent. e.g. Using an exponent of 0.5, the second document from a site will have it's score divided by 2^0.5 (Inverse square root of rank).
    -SameSiteSuppressionOffset <n> <integer> (default=0) SameSiteSuppressionOffset <integer> Number of additional documents from a site beyond the first that are allowed their full score before applying a same site suppression penalty (Default 0).
    -diversity_rank_limit<n> <integer> (default=200) - - Diversification won't alter ranking beyond rank n. (Allows administrator to limit response time.)
    -vsimple - vsimple on Very simple ranking. Equivalent to -sco0 -nocool -SSS0 -kmod0.
    -sort <type_of_sort> (date , adate , title , dtitle , size , dsize , url , durl , coll , dcoll , metaX , dmetaX , shuffle) sort (date , adate , title , dtitle , size , dsize , url , durl , coll , dcoll , metaX , dmetaX , shuffle) Sort top results by date, ascending date, title, descending title, file size, descending file size, url, descending url, collection name, descending collection name, metadata class X, descending metadata class X, or shuffle results randomly respectively
    -sortall - sortall sortall (by itself with no value) Include partial matches in the resorting performed by -sort.
    -gscope1 <e> <gscope expression> gscope1 <gscope expression> Present only results whose gscope bits match the given gscope expression.
    -annieonly - - - Process queries using annotation indexes only. (Normal indexes are not used. Operators and metafields stripped from queries.)
    -anniefallback - - - Process queries using annotation indexes, falling back to normal indexes if insufficient results. (Most query operators are stripped.)
    -anniecombo - - - Process queries using both annotation and normal indexes. (Most operators stripped from queries.)
    -lscope <pattern> <pattern> lscope <pattern> Present only results whose URL matches a left-anchored pattern.
    -xscope <url> <url> xscope <url> Present only results whose URL exactly matches the provided URL (after canonicalisation).
    -scope <list> <url patterns> scope <url patterns> Present only results whose URL satisfies the include/exclude scopes included in list (comma separated). e.g. -scope anu.edu.au,-anu.edu.au/archives
    -nolsc - lscorrect off Don't correct link scores across components in a meta collection.
    - - profile <string> Indicates that the named profile directory should be used instead of the collection's main directory for finding synonym files, best bets files and additional padre options files. The named directory is assumed to be a subdirectory of the main collection configuration directory.
    - - clive <collection_ID> Relevant only to meta collections. Specifies the sub-collections to be searched for the current query. Use separate parameters for multiple collections e.g. clive=collection_one&clive=collection_two

    D. Controlling how search results are presented.

    Command Line Values CGI Parameter Values Description
    -res mail|web|html|xml|urls|ctest) res mail|web|html|xml|urls|ctest) Set result format type.
    -explain - explain off or on In certain modes explain the results by showing score components.
    -SM snip|debug|meta|qb|auto|both) SM snip|debug|meta|qb|auto|both) Summary mode
    -SHLM<val> 1|2|5)
    (default=2) 
    SHLM 1|2|5)
    (default=2) 
    Select highlighting method within snippets in XML.
  • 0 - No highlighting.
  • 1 - Highlight with HTML strong tags
  • 2 - No highlighting, but provide a highlighting regular expression for an external system to apply.
  • 5 - Highlight with HTML strong tags, but remove accents from summary before highlighting, provided query was not accented.
  • -nomdsfhl - mdsfhl off Don't highlight query terms in MDSF metadata summaries
    -SF<fields> <metadata classes> SF <metadata classes> Metadata fields to include in summaries (if applicable).
    -MBL<n> <integer> (default=250) MBL <integer> Set metadata buffer length to n.
    -SBL<n> <integer> (default=250) SBL <integer> Set summary buffer length to n.
    -SQE<n> <number> (default=3) SQE <number> Set max no. of query biased summary excerpts to n.
    -EORDER<val> 1)
    (default=0) 
    SQE 1) Specify presentation order of query biased summary excerpts.
  • 0: Order in which excerpts occur in the document.
  • 1: Best excerpts first.
  • -num_tiers<n> <integer> (1 <= n <= 50, default=10) num_tiers <integer> Limit number of result list tiers to n
    -num_ranks<n> <integer> (1 < n <= 1,000,000, default =0) num_ranks <integer> Limit number of results to n
    -start_rank<n> <integer> (1 < n <= 1,000,000, default=1) start_rank <integer> Present results starting from n
    -rank_limit<n> <integer> (1 < n <= 1,000,000,000, default=1,000,000,000) - - Limit highest rank which can be requested to n
    -fmo - fmo on Present full matches only.
    -notierbars - tierbars off Suppress tierbars in result list output (XML and HTML)
    -nocdata - - - Don't encapsulate metadata summaries or titles in CDATA sections
    - - oneshot oneshot (just by itself with no value) Returns a page that redirects to the highest ranked document, rather than a results page.

    E. Miscellaneous options.

    Command Line Values CGI Parameter Values Description
    -bb_enabled {boolean} bb_enabled {boolean} Enable / disable presentation of 'best bet' results. Default on
    -events Enable Event Search mode
    -nolog - - - Do not write query log entries.
    -qlog_file=<f> <file> - - Write the query log to the specified file rather than SEARCH_HOME/data/COLLECTION_NAME/live/log/queries.log
    -canq - - - Write reordered queries to log. Default off (used to be on)
    -flock - - - Use flock when locking the query logfile. (Default on Linux)
    -unbuf - - - Don't buffer the standard output stream. May slow response but is useful for debugging.
    -stdoutbuf - - - Buffer the standard output stream. In some specific cases can improve performance.
    -zom - zom on Include docs in results even if noindex or killed.
    -noexpired - - - Exclude expired docs from results. (Nullified by -zom)
    -ndt - - - Don't do tests on docs, e.g. phantom, zombie, *scope, binary, expired.
    -binonly - binary - Only include binary docs in results.
    -nonbin - - - Only include non-binary docs in results.
    -mat<n> <integer> (default=12) - - Set matchset size to n million. Only need to increase on very large collections.
    -countgbits <e> <comma-separated list of gscope bitnumbers> countgbits <comma-separated list of gscope bitnumbers> e is a comma-separated list of gscope bitnumbers for which we need to produce counts.
    -rmcf <list of metadata field characters> - - Fields listed in s will have their words counted in results sets. (Fields representing facets.)
    -rmc_sensitive - - - Treat RMC items (facet topics) case sensitively. (Default is insensitive.)
    -rmc_maxperfield<n> <integer> - - Set maximum number of RMC items to display per field at n. (Default is 100.)
    -count_urls <n> - - - Display counts of results grouped by the URL path (Up to depth n). Specify 0 to use the default (5).
    -QL=<n> <integer> QL <integer> Activate Quick Links facility for default pages down to the specified level. 0 - off; 1 - server root pages; 2 - next level down. (dflt 0)
    -QL_rank=<n> <integer> QL_rank <integer> If QuickLinks capability is active, show quick links for search results down to the specified rank. (range 1 - unlimited) (dflt 1)
    -QL_rank_is_relative {boolean} QL_rank {boolean} If true, the value of QL_rank will be interpreted relative to the start_rank. E.g. if QL_rank=2, the first two results on each page may show QuickLinks. (dflt OFF)

    F. Contextual Navigation and Spelling Suggestions.

    Command Line Values CGI Parameter Values Description
    -contextual_navigation_enabled - contextual_navigation_enabled true or false Control whether contextual navigation suggestions are enabled or not. Default: true
    -contextual_navigation_fields=s <string> contextual_navigation_fields <string> String "s" lists the metadata fields to be scanned for contextual navigation suggestions. Default: 'ct'. (Note that scanning of document text can be suppressed by including a minus.)
    -cnto=s <floating point> cnto <floating point> Set contextual navigation time-out to "s" seconds (s floating point). (Processing may be omitted entirely if elapsed time for a query already exceeds s seconds). Value of form n.[m], for example: 9.1, 9, 0.9. The default is 5.0 seconds.
    -spelling_enabled - spelling_enabled true or false Specify whether spelling suggestions are enabled. Default: true
    -spelling_alpha=f <float> spelling_alpha <float> Change the weighting between "edit distance" and "popularity" of a candidate suggestion (0-1). The larger this value the more weight is given to edit distance. A setting of 0.5 would give equal weighting to edit distance and popularity. Default value: 0.7
    -spelling_difflen_thresh=i <integer> spelling_difflen_thresh <integer> Don't make suggestions more than "i" characters longer or shorter than query. Default value: 2
    -spelling_edist_thresh=f <float> spelling_edist_thresh <float> Don't make suggestions whose edit distance from the query exceeds "f". Default value: 4.0
    -spelling_fullmatch_trigger_const=f <float> spelling_fullmatch_trigger_const <float> Don't look for suggestions if there are at least f * log10(num docs) full matches. Default value: 30.0. Note - You may also need to adjust spelling.suggestion_threshold.
    -spelling_min_querylen=i <integer> spelling_min_querylen <integer> Suggestions not made for queries shorter than "i". Default value: 2
    -spelling_wt_thresh=f <float> spelling_wt_thresh <float> Don't make suggestions whose weight is less than "f". Default value: 0.01

    G. Geospatial options.

    Command Line Values CGI Parameter Values Description
    -origin <lat> <long> <floating point degree specification> <floating point degree specification> origin <floating point degree specification>,<floating point degree specification> For geographical searches, this parameter specifies the origin (or centre) of the search. It is used in conjunction with the maxdist option
    -maxdist <km> <floating point number> maxdist <floating point number> For geographical searches, this parameter specifies that all search results should be located within n kilometers of the specified origin, where n is the parameter given. It is used in conjunction with the origin option.
    -sort <sort_type> (prox or dprox) sort (prox or dprox) Sort top results by proximity to origin, or descending proximity to origin respectively.

    H. Specifying where queries come from

    Command Line Values CGI Parameter Values Description
    -ctest <testfile> The name of a test file in c-test format. - - Activates c-test mode and reads a batch of queries from a testfile (in C_TEST format).

    I. Security related options

    Command Line Values CGI Parameter Values Description
    -userkeys=<string> <keys granted to the user> - - Conduct this search with security keys provided granting the user rights to individual results.
    -secPlugin=<string> <plugin name> - - Use the specified plugin to match document locks with user keys
    -ldlibrarypath=<string> <library path> - - Specifies the full path at which the required security plugin can be found.

    Default value

     query_processor_options=
    

    That is, no additional options.

    Examples

    Query processor set to perform Same Site Suppression, along with using synonyms:

     query_processor_options=-SSS2 -THS/home/search/conf/myCollectionName/synonyms.cfg
    

    Query processor set to sort the top results by filesize:

     query_processor_options=-sort size
    

    Query processor set to upweight title text to twice the original weight, sort all results by date and use word stemming when searching for results:

     query_processor_options=-wmeta t 2.0 -sort date -sortall -stem2
    

    See also

    top ⇑