Funnelback logo

Documentation

CATEGORY

Query processor options (collection.cfg)

Description

This option specifies additional configuration options that can be supplied to the query processor when performing queries. The PArallel Document Retrieval Engine (PADRE) query processor is a powerful engine that can be finely controlled through a large list of options that can be given to it. Often these options can be specified in this collection configuration parameter, or as a CGI parameter passed with the search request URL. The list of options available is given here.

Notes

  • The CGI parameter for a query processor option will have the same name e.g. for the collapsing query processor option you would specify collapsing=on in your CGI request.
  • If an option is of type Boolean then valid values for this are "on" or "off".
  • Query processing will not occur if the query processor is given an invalid option.
  • Query processor options can affect Funnelback's speed and result quality, so change them with caution.
  • Numerical_Metadata search is currently only accessible using CGI parameters and not as query processor options.
  • Note these options have had a major revamp in version 12 and 12.2. Please see query processor options changes for more details.

A. Contextual navigation options

Option Name Type Range Description
categorise_clusters Boolean - Whether contextual navigation suggestions are grouped by type.
cnto Float 0.000000 - unlimited Set contextual navigation time-out to s seconds (s floating point). processing may be omitted emtirely if elapsed time for a query already exceeds s seconds. (dflt 5.0).
contextual_navigation Boolean - Whether or not to activate the contextual navigation system.
contextual_navigation_fields String - String s lists the metadata fields to scan for contextual navigation suggestions. (dflt 'ct'). Note that scanning of document text can be suppressed by including a minus.
max_phrase_length Integer 3 - 7 Maximum length (in words) of contextual navigation suggestions.
max_phrases Integer 0 - unlimited After this number of candidate phrases have been checked, contextual navigation processing will stop.
max_results_to_examine Integer 0 - 200 Maximum number of search results to scan for contextual navigation suggestions.
site_max_clusters Integer 0 - unlimited Maximum number of site clusters to present in contextual navigation.
summary_fields String - Which metadata fields the contextual navigation system will consider.
topic_max_clusters Integer 0 - unlimited Maximum number of topic clusters to present in contextual navigation.
type_max_clusters Integer 0 - unlimited Maximum number of type clusters to present in contextual navigation.

B. Geospatial options

Option Name Type Range Description
geospatial_ranges Boolean - Calculate geospatial distance from origin and bounding box ranges when geospatial data is configured and available.
maxdist Float 0.000000 - unlimited Exclude results not within <f> km of origin.
origin String - <lat,long> Set origin to lat, long (floating point degrees).

C. Informational options

Option Name Type Range Description
canq Boolean - Write reordered queries to log. (dflt off)
count_dates String - Report facet counts for dates such as 'today', 'last week', 'this year'. Note that date categories may overlap. Only value currently supported is 'd'.
count_urls Integer - Display counts of results grouped by the URL path (Up to depth i). If is not present or 0, then the default value is used. Dflt 5. [Not CGI]
rmcf String - Metadata fields to have their words counted in result sets (fields representing facets).
rmrf String - Numerical fields listed will have their ranges calculated in result sets.
showtimes Boolean - Print elapsed times for each stage of query processing.

D. Logging options

Option Name Type Range Description
ip_to_log String - ip|ip_hash|remote_user).
log Boolean - Write query log entries (dflt on). [Not CGI]
qlog_file String - If writing query log entries, write them to <FILE>. [Not CGI]
username String - A string identifying the current user to be used in padre's query log.

E. Miscellaneous options

Option Name Type Range Description
countgbits String - s is either "all" or a comma-separated list of gscope bitnumbers for which counts are needed. (Bits numbered from zero.)
exit_on_bad_component Boolean - Fail when a component has an incompatible index relative to the first (rather than skip).
flock Boolean - Use flock when locking the query logfile. If set to no, lockf is used instead. Default on Solaris is 'no', all other systems 'yes'.
mat Integer 0 - 2147 Set matchset size to n million (dflt 24). Only need to increase on very large collections. [Not CGI]
ndt Boolean - Don't do tests on docs, e.g. phantom, zombie, *scope, binary, expired. [Not CGI]
unbuf Boolean - Don't buffer the standard output stream. In some specific cases, setting this to 'no' can improve performance.

F. Presentation options

Option Name Type Range Description
EORDER Integer 0 - 1 Specify presentation order of query biased summary excerpts. 0: natural order in doc. 1: sorted by score. (dflt 0)
MBL Integer 1 - unlimited Set buffer length per displayed metadata field to n bytes (dflt 250 bytes). Warning: setting very large values will increase query processor memory demands and may cause problems.
SBL Integer 1 - unlimited Set summary buffer length to n bytes. (dflt 250 bytes)
SF String - Metadata fields to include in summaries. (if applicable).
SHLM Integer 0 - 7 Select highlighting method within snippets in XML. 0 - No highlighting ; 1 - HTML strong tags ; 2 - Show highlighting regexp. and unhighlighted summary [dflt]; 5 - Use HTML strong tags but remove accents from summary before highlighting, provided query was not accented.
SM String - Summary mode (off;snip;debug;meta;qb;def;auto;both) - both means qb and meta.
SQE Integer 1 - 10000 Set max no. of query biased summary excerpts to n (dflt 3).
bb Boolean - If set, the query processor will may insert "best bets" (formerly known as "featured pages") suggestions from best_bets.cfg.
ctest_mode Integer 0 - 3 Controls behaviour of padre-sw when -ctest is used. 0: no internal evaluation; 1 - internal evaluation only. Output is brief plain text report of measures; 2 - internal evaluation only. Output in plain text with QBQ output followed by measures; 3 - internal evaluation plus normal CTOUT output in XML (with measures presented as comments)
explain Boolean - Explain rankings by showing score components. (Note that -explain=on turns off result set diversification).
explore Integer 7 - 50 Show 'explore' links against results. The value specifies how many terms to include in the expanded query.
gscoperesult String - Specifies the bit number that results will be set to in -res gscope or -res docnums modes (dflt 1). [Not CGI]
mdsfhl Boolean - Are query terms only highlighted in MDSF metadata summaries
num_ranks Integer 0 - unlimited Limit number of results to n (min = 0, dflt = 10).
num_tiers Integer 0 - 50 Limit number of result list tiers to n (min = 0 (no ,limit), max = 50, dflt no limit)
oneshot Boolean - Directs the browser to the first result rather than displaying a page of results. This feature is not available with the Modern UI.
qieval Float 0.000000 - 1.000000 Set the value presented for query independent evidence when using the qiecfg result format. (dflt 0.5).
qwhl String - Determines which parts of a search result are highlighted. S - snippet, M - metadata, U - URL, T - title. E.g. -qwhl=MUT
res String - Set result format. Possible values are: trec, mail, web, html, xml, urls, qiez, qieo, gscope, docnums, ctest, qiecfg or flcfg.
results_in_facet_categories Integer 0 - 100 Include the specified number of pre-computed search results under the rmc count element for metadata facet categories.
rmc_maxperfield Integer 0 - unlimited Set maximum number of RMC items to display per field at n (dflt 100).
rmc_sensitive Boolean - Treat facet categories (RMC items) case sensitively (default no). [Not CGI]
show_qsyntax_tree Boolean - Include an SVG representation of the query-as-processed in output.
start_rank Integer 1 - unlimited Present results starting from n (dflt 1).
tierbars Boolean - Display tierbars in result list output (XML and HTML)

G. Query interpretation options

Option Name Type Range Description
STOP String - Use the stoplist specified in <file> (one word per line) [Not CGI]
binary Integer 0 - 3 Determines whether or not binary documents are returned in the results. 0 - show all documents; 1 - show only binary documents; 2 - show only non-binary documents.
case Boolean - Whether or not query processing should be case sensitive (No effect unless indexes built with -case. dflt no).
clive String - Dynamic metacollections. Specifies the number (from 0) of one component within the .sdinfo file(s) to make active.
daat_termination_type Integer 0 - 2 Selects how DAAT early exit is determined. 0 - try for d results with every metafield and every component; 1 - try for d results over every component but not necessarily every metafield; 2 - stop a soon as d results are obtained. (d is the parameter to -daat.)
daat_timeout Float 0.000000 - 3600.000000 Impose a soft timeout (in seconds) on the time taken by the DAAT machinery for one query. [Not CGI]
dont_estimate_full_matches Boolean - In DAAT mode don't guess the number of full matches when the DAAT depth did not let us processes an entire postings list.
dont_estimate_full_matches Boolean - In DAAT mode don't guess the number of full matches when the DAAT depth did not let us processes an entire postings list.
enc String - Specify the encoding that input queries are submitted in (default UTF-8). This option is not supported when using the Modern UI
events Boolean - Must be set if event search is to be used
fmo Boolean - Present full matches only.
lang String - If a 2-character language code is specified by this means, then stemmers etc specific to that language will be used, IF AVAILABLE. It is also permissible to use a 5-character code like en_GB, but padre behaviour will be the same as for en. Specifying lang also makes title and metadata sorting of results locale-specific, however support for this on Windows platforms is limited and problematic.
loose Integer 0 - unlimited Phrase looseness in words (min = 0, dflt = 0).
max_qbatch Integer 1 - unlimited Terminate batch query processing after the specified number of queries have been processed.
max_terms Integer 1 - unlimited Truncate queries after the specified number of terms. If the query is reordered, truncation will occur after reordering.
min_truncated_len Integer 0 - 20 The text part of a query term with a right truncation operator must have at least this length. E.g. if min_truncated_len were 4 funnel* would be accepted but fun* would be processed as fun. [Not CGI]
noexpired Boolean - Exclude expired docs from results. (Nullified by -zom) [Not CGI]
nulqok Boolean - An empty query submitted via CGI will be processed as a null query. (dflt is to ignore the request). [Not CGI]
phrase_prox_word_limit Integer 1 - unlimited Phrase or proximity terms with more than this number of words will be shortened by deleting words from the right. E.g. If this limit were 4 then `to be or not to be` would be processed as `to be or not` [Not CGI]
prox Integer 0 - unlimited Proximity limit in words (min = 0, dflt = 15).
qsup String - When blending queries, determines sources of supplementary queries to be tried, with corresponding weights assigned to each source (ranging from 0 to 1). No spaces. 'off' may be specified to disable supplementary queries. E.g. -qsup=SPEL/0.9+USUK/0.4+SYNS/0.1+LANG/0.1. Available sources are: SPEL (spelling suggestions); USUK (table of spelling differences between US and UK English); SYNS (synonyms as defined by the blending.cfg file); LANG (experimental German decompunding)
query_reorder Boolean - Reorder terms in query so that the most discriminating (least common) appear first. Often coupled with -max_terms=
ras Integer 0 - 2 Remove any stopwords from the query. Possible values: 0 - remove none; 1 - remove dynamically depending on the query; 2 - remove all stopwords (dflt 1).
service_volume String - Either 'high' or 'low'. A convenience setting to increase or reduce allowable query complexity and timeouts according to service volumes -- large or small indexes, high or low query volumes. [Not CGI]
stem Integer 0 - 3 Controls stemming of queries. 0 - do not stem (dflt); 1 - do not stem (replaces obsolete option); 2 - Stem all query words (light - English/French plural/singular only); 3 - Stem all query words(heavier).
stem_lconly Boolean - When stemming, stem only lowercase query words (to avoid stemming proper names and acronyms).
strip_invalid_utf8 Boolean - Normally, invalid UTF-8 characters are removed during indexing. If this hasn't happened. This option allows them to be removed from result packets.
synonyms Boolean - If set, the query processor will expand queries using thesaurus in synonyms.cfg.
truncation_allowed Integer 0 - 3 Enables the use of the * operator, binary valued, it is only valid in use with an option that disables DAAT mode such as, -service_volume='lo' or -daat=0. When applied all contexts are available such as: *:funnelback, funnel*, *back, and *:*elba*. [Not CGI]
wildcard_thresh Integer 0 - unlimited If the postings list for a term is longer than the specified value (in MB) it will be treated as a wildcard.
zom Boolean - Include docs in results even if noindex or killed.

H. Query source options

Option Name Type Range Description
ctest String - Read a batch of queries from testfile (in C_TEST format). Sets output format to RM_CTEST, but that may be overridden. (See es.csiro.au/C-TEST/ for information about C-TEST.) [Not CGI]
s String - System-generated query inserted behind the scenes by a form or front-end.

I. Quicklinks options

Option Name Type Range Description
QL Integer 0 - 5 Activate QuickLinks facility for default pages down to the specified level. 0 - off; 1 - server root pages; 2 - next level down.
QL_rank Integer 1 - unlimited If QuickLinks capability is active, show quick links for search results down to the specified rank.
QL_rank_is_relative Boolean - If true, the value of QL_rank will be interpreted relative to the start_rank. E.g. if QL_rank=2, the first two results on each page may show QuickLinks.

J. Ranking options

Option Name Type Range Description
SSS Integer 0 - 10 Same site suppression depth: 0 - no suppression (dflt for non-web collections.); 2 - hosts and their top level dir's (dflt for web and meta collections; 10 - special meaning for big Web applications.
SameSiteSuppressionExponent Float 0.000000 - unlimited Same site suppression penalty exponent (dflt 0.5, recommended range 0.2 - 0.7).
SameSiteSuppressionOffset Integer 0 - 1000 Number of additional documents from a site beyond the first that are allowed their full score before applying a same site suppression penalty (dflt 0)
absscores Boolean - Report content scores as % of max possible Okapi score (Intended for use with -vsimple=on).
anniemode Integer 0 - 3 Control the use of annotation indexes. 0 - do not use annotation indexes ; 1 - Process queries using annotation indexes only; 2 - Process queries using annotation indexes, falling back to normal indexes if insufficient results. (Most query op.s stripped.) 3 - Process queries using both annotation and normal indexes (Most operators stripped from queries.). Default 0.
b Float 0.000000 - unlimited Set Okapi b to f (dflt 0.75)
cgscope1 String - Documents matching this gscope expression (reverse Polish) can be upweighted with -cool68. Those not matching can be upweighted with -cool.70.
cgscope2 String - Documents matching this gscope expression (reverse Polish) can be upweighted with -cool69. Those not matching can be upweighted with -cool.71.
cool.<Key> Key/Value pair - cool.N=V Set a value for the Nth tune parameter. See cooler ranking options for possible values of N.
cool Boolean - Whether to use topic distillation scoring (cool and cooler). Dflt on.
daat Integer 0 - 10000000 Specifies the maximum number of full matches for Document-At-A-Time processing. If set to 0, Term-At-A-Time is used instead (dflt 5000).
diversity_rank_limit Integer 10 - unlimited Diversification won't alter ranking beyond rank n (default 200, min 10).
gscope1 String - Present only results whose gscope bits match reverse Polish expression e (Bits numbered from zero). If set to 'off', disable any previous expression.
k1 Float 0.000000 - unlimited Set Okapi K1 to <f>. (dflt 2.0)
kmod Integer 0 - 1 Select special scoring function i for special fields. 0 = normal, 1 = AF1 (dflt 1).
lscope String - Present only results whose URL matches a sort-of left-anchored pattern.
lscorrect Boolean - Whether to correct link scores across meta collection components (default yes).
main_homepage_factor Float 0.000000 - 1.000000 Penalise score of the homepage of a single-entity-controlled domain to prevent over representation in results sets. E.g. www.anu.edu.au/ in an index of ANU.
meta_suppression_field Single character - If same_meta_suppression is activated, the specified metadata field will be the field to which it applies. Only one metadata field can be treated in this way.
near_dup_factor Float 0.000000 - 1.000000 The query processor will penalise a result which is a near-duplicate of a previous result by multiplying by the factor specified. The penalty stiffens with more repetition.
neardup Float 0.000000 - 1.000000 Near dupulicates in ranking are multiplied by f. f=1 turns off near-dup detection.
promote_urls String - Insert the specified URLs at or near the top of the results list for a query. Value is a space separated list of URLs. URLs must correspond to those recorded by padre-iw. (dflt Inactive)
quanta Integer 10 - 100000 Set the number of possible score quantisation levels for each cool variable. In general, a high number should give more accurate ranking but may slow query processing.
rank_limit Integer 10 - unlimited Limit highest rank requestable to n (dflt 1,000,000,000).
ranking_profile Integer 0 - 100 Choose a profile of settings for the ranking function. 0 - current default; 1 - Standard BM25; 2 - Traditional (pre-12.0) Funnelback. Setting a profile does not override explicit settings. [Not CGI]
recency_decay_vals String - <z,w,m,y,d,c,m> - Define how recency scores decay with time. z w, m, y, d, c, m are floats in the range 0 - 1, which specify the recency score assigned to documents, 0 days, 1 wk, 1 mth, 1 yr, 1 dec, 1 cen, 1 mill. old. (dflt 1.0,0.75,0.5,0.25,0.025,0.0025) Recency scores between key values linearly interpolated. Past the millennium, recency scores are 1/daysold.
reference_date String - If specified, recency is based on this date rather than that of most recent doc. Format is <yyyymmdd>, or 'today'.
remove_urls String - Prevent the specified URLs from appearing in the results for a query. Value is a space separated list of URLs. URLs must correspond to those recorded by padre-iw. (dflt Inactive)
repetitiousness_factor Float 0.000000 - 1.000000 Penalise a repetitious result by multiplying by the factor specified. (Repetitiousness may involve same-site, same component or repeated metadata.) The penalty stiffens with more repetition.
same_collection_suppression Float 0.000000 - 1.000000 While searching a meta-collection, penalise the secoond result from the same primary collection as a previous result by multiplying by the factor specified. The penalty stiffens with more repetition.
same_meta_suppression Float 0.000000 - 1.000000 Penalise the second result with the same value for a specified metafield as a previous result by multiplying by the factor specified. The penalty stiffens with more repetition.
sco String - <n>[<classes>] Set doc scoring mode to n, using the classes specified. Most common values: 0 - score using doc text only ; 1 - no scoring. Produce an unordered set of results ; 2 - score using anchortext and URLs as well, upweight titles (or whatever fields are configured with -specf).
scope String - Present only results whose URL satisfies the include/exclude scopes included in list (comma separated). e.g. -scope=anu.edu.au,-anu.edu.au/archives
sort String - Sort top results by <string>. Possible values: 'date', 'adate' (ascending date), 'title', 'dtitle' (descending title), 'size' (file size), 'dsize' (decending filesize), 'url', 'durl' (descending url), 'coll' (collection name, then score), 'dcoll' (decending collection name, then score), 'meta<f>' (by metadata field f, then score),'dmeta<f>' (decending metadata field d, then score), 'shuffle' (random to avoid bias), 'prox' (for geo search: Sort top results by proximity to origin), 'dprox' (for geo search: Sort top results by decending proximity to origin). (dflt is case-insensitive for title and meta)
sort_sensitive Boolean - Use case-sensitive sorting when sorting results by title or metadata strings.
sortall Boolean - Include partial matches in the resorting performed by -sort.
specf String - Fields listed in string s will be scored specially and added to query when using the -sco=2 mode (dflt 'kK').
sss_defeat_pattern String - URLs matching the specified pattern (currently a simple string match) will not be subject to samesite suppression.
static_cool_exponent Float 0.000000 - 1.000000 Control the extent to which static scores are attenuated with length of query. 0 => no attenuation; 1 => max attenuation. Attenuation by len ** -f.
title_dup_factor Float 0.000000 - 1.000000 The query processor will penalise a result which has exactly the same title as a previous result by multiplying by the factor specified. The penalty stiffens with more repetition.
unknown_daysold Integer 0 - unlimited A doc with unknown date is assumed to be d days old (for recency calcs) (dflt 366).
use_Paik Boolean - Use the tf.idf scheme proposed by Jiaul Paik at SIGIR 2013 rather than the more conventional BM25 variant.
use_secds Boolean - When working with domain-importance features in ranking, use SECDs if value is on, and raw domain names otherwise.
vsimple String - Very simple ranking. If set to 'on', equivalent to -sco=0 -cool=off -SSS=0 -kmod=0.
weight_only_fields String - Documents will not be retrieved in DAAT mode if they only match unfielded query terms in one or more of the implicit fields listed here. For example, specifying 'Kk' will stop the query 'Monica Lewinski' matching a document solely because of click data or referring anchortext.
wmeta.<Key> Key/Value pair - wmeta.C=F Set upweighting factors for metadata class scoring. C - metadata class; F - weight to set. (dflt 0.5 for 'k' and 'K', 1 for everything else).
xscope String - Present only results whose URL exactly matches the provided URL (after canonicalisation).

K. Result collapsing options

Option Name Type Range Description
collapsing Boolean - Activate collapsing. Collapsing will be based on document content ('$') unless a collapsing_sig value is specified. Note that use of this option will disable result set diversification.
collapsing_SF String - Metadata fields to include in display for collapsed documents (assuming collapsing_num_ranks is non-zero). (dflt no fields)
collapsing_label String - Label to indicate why items have been collapsed. (dflt "which are very similar")
collapsing_num_ranks Integer 0 - 1000 Specify how many collapsed results are to be shown under the uncollapsed ones. (dflt 0)
collapsing_sig String - The collapsing_control segment to use when collapsing. E.g. "ap", collapse on author+publisher. The value must correspond to one segment of the indexing.collapse_fields string. (Segments are separated by commas.) (dflt '$' (Collapsing on document content.))

L. Security options

Option Name Type Range Description
dls_internal_test Integer 0 - unlimited This allows testing of the padre side of the custom document level security mechanism. There is no call out to an external function. The value is interpreted as a combination of bits: 1 bit - dls_internal_test is active/not active; 2 bit - selects whether MINRESULTS mode is used or not. During internal testing, every odd numbered document in the original ranking is arbitrarily treated as inaccessible.
ipreject String - <queryLimit>,<windowSeconds>,<upperQueryLimit> - Use an ip rejector to limit requests from a single machine. Allow <queryLimit> queries per <windowsSeconds>, don't record more than <upperQueryLimit> queries. [Not CGI]
ldLibraryPath String - Full path to security plugin library [Not CGI]
locking_model String - Name of locking model, either "trim" or "sharepoint". [Not CGI]
no_security Boolean - Disable DLS, available as a command line option. [Not CGI]
secPlugin String - Name of security plugin library [Not CGI]
secPluginScript String - Name of security plugin script [Not CGI]
userkeys String - Conduct this search with security keys specified by s. [Not CGI]

M. Spelling options

Option Name Type Range Description
spelling Boolean - Activate spelling suggestion mechanism.
spelling_alpha Float 0.000000 - 1.000000 Set the weighting between 'closeness to the query' and support in the collection for a candidate suggestion. Big alpha, high weight on closeness to the query.
spelling_blend_thresh Float 0.000000 - 1.000000 Confidence threshold for automatically blending results for a query suggestion with those from the user's original query.
spelling_difflen_thresh Integer 0 - 1000 Don't make suggestions more than i characters longer or shorter than query.
spelling_dym_thresh Float 0.000000 - 1.000000 Confidence threshold for making a 'Did you mean' suggestion.
spelling_edist_constant Float 0.000000 - 1000.000000 Don't make suggestions whose edit distance from the query exceeds f + query_length * spelling_edist_proportion
spelling_edist_proportion Float 0.000000 - 1.000000 Don't make suggestions whose edit distance from the query exceeds spelling_edist_constant + query_length * f (0<=f<=1)
spelling_fullmatch_trigger_const Float 0.000000 - inf Don't look for suggestions if there are at least f * log10(num docs) full matches.
spelling_fullmatch_trigger_const Float 0.000000 - unlimited Don't look for suggestions if there are at least f * log10(num docs) full matches.
spelling_include_context Boolean - Include the non-corrected part of the query in the suggestion link.
spelling_min_querylen Integer 1 - 1000 Suggestions not made for queries shorter than this.
spelling_wt_thresh Float 0.000000 - 100.000000 Don't make suggestions whose weight is less than this. Weight is complex to explain, sorry.

N. TREC specific options

Option Name Type Range Description
trec_runid String - For TREC participation: Each result in TREC format will include this runid.
trec_topic Integer 0 - unlimited For TREC participation: The first query in a batch will get this topic number. Each new query will increase the number by one.
trecids Boolean - For TREC participation: Each result in TREC format will use the TREC docno rather than a URL

Default value

 query_processor_options=

That is, no additional options.

Examples

Query processor set to perform Same Site Suppression, along with using synonyms:

 query_processor_options=-SSS=2 -THS=/home/search/conf/myCollectionName/synonyms.cfg

Query processor set to sort the top results by filesize:

 query_processor_options=-sort=size

Query processor set to upweight title text to twice the original weight, sort all results by date and use word stemming when searching for results:

 query_processor_options=-wmeta.t=2.0 -sort=date -sortall -stem=2

See also

top ⇑