Query processor options (collection.cfg)


This option specifies additional configuration options that can be supplied to the query processor when performing queries. The PArallel Document Retrieval Engine (PADRE) query processor is a powerful engine that can be finely controlled through a large list of options that can be given to it. Often these options can be specified in this collection configuration parameter, or as a CGI parameter passed with the search request URL. The list of options available is given here.


A. Contextual navigation options

Option Name Type Range Description
categorise_clusters Boolean - Whether contextual navigation suggestions are grouped by type.
cnto Float 0.000000 - unlimited Set contextual navigation time-out to s seconds (s floating point). processing may be omitted entirely if elapsed time for a query already exceeds s seconds. (dflt 5.0).
contextual_navigation Boolean - Whether or not to activate the contextual navigation system.
contextual_navigation_fields String - String s lists the metadata fields, separated by commas surrounded by square brackets, to scan for contextual navigation suggestions. (dflt '[c,t]'). Note that scanning of document text can be suppressed by including a minus, for example '[-,c,t]'.
max_phrase_length Integer 3 - 7 Maximum length (in words) of contextual navigation suggestions.
max_phrases Integer 0 - unlimited After this number of candidate phrases have been checked, contextual navigation processing will stop.
max_results_to_examine Integer 0 - 200 Maximum number of search results to scan for contextual navigation suggestions.
site_max_clusters Integer 0 - unlimited Maximum number of site clusters to present in contextual navigation.
topic_max_clusters Integer 0 - unlimited Maximum number of topic clusters to present in contextual navigation.
type_max_clusters Integer 0 - unlimited Maximum number of type clusters to present in contextual navigation.

B. Geospatial options

Option Name Type Range Description
geospatial_ranges Boolean - Calculate geospatial distance from origin and bounding box ranges when geospatial data is configured and available.
maxdist Float 0.000000 - unlimited Exclude results not within <f> km of origin.
origin String - <lat,long> Set origin to lat, long (floating point degrees).

C. Informational options

Option Name Type Range Description
canq Boolean - Write reordered queries to log. (dflt off)
count_dates String - Report facet counts for dates such as 'today', 'last week', 'this year'. Note that date categories may overlap. Only value currently supported is 'd'.
count_urls Integer - Display counts of results grouped by the URL path (Up to depth i). If is not present or 0, then the default value is used. Dflt 5. [Not CGI]
rmcf String - Metadata fields to have their words counted in result sets (fields representing facets). To count fields 'a' and 'c', set this to '[a,c]'.
rmrf String - Numerical and geospatial fields listed will have their ranges calculated in result sets. To see the ranges of field 'height' and the bounding box geospatial field 'X' set this to '[height,X]'.
showtimes Boolean - Print elapsed times for each stage of query processing.

D. Logging options

Option Name Type Range Description
ip_to_log String - ip|ip_hash|remote_user).
log Boolean - Write query log entries (dflt on). [Not CGI]
qlog_file String - If writing query log entries, write them to <FILE>. [Not CGI]
username String - A string identifying the current user to be used in padre's query log.

E. Miscellaneous options

Option Name Type Range Description
countgbits String - s is either "all" or a comma-separated list of gscope bitnumbers for which counts are needed. (Bits numbered from zero.)
exit_on_bad_component Boolean - Fail when a component has an incompatible index relative to the first (rather than skip).
flock Boolean - Use flock when locking the query logfile. If set to no, lockf is used instead. Default on Solaris is 'no', all other systems 'yes'.
mat Integer 0 - 2147 Set matchset size to n million (dflt 24). Only need to increase on very large collections. [Not CGI]
ndt Boolean - Don't do tests on docs, e.g. phantom, zombie, *scope, binary, expired. [Not CGI]
unbuf Boolean - Don't buffer the standard output stream. In some specific cases, setting this to 'no' can improve performance.
view String - The collection view the perform the query against when in CGI mode. Normally 'live' (default), 'offline' or 'snapshot###'.

F. Presentation options

Option Name Type Range Description
EORDER Integer 0 - 1 Specify presentation order of query biased summary excerpts. 0: natural order in doc. 1: sorted by score. (dflt 0)
MBL Integer 1 - unlimited Set buffer length per displayed metadata field to n bytes (dflt 250 bytes). Warning: setting very large values will increase query processor memory demands and may cause problems.
SBL Integer 1 - unlimited Set summary buffer length to n bytes. (dflt 250 bytes)
SF String - Metadata fields to include in summaries. (if applicable). To include fields 'a' and 'd' set this to '[a,d]'.
SHLM Integer 0 - 7 Select highlighting method within snippets in XML. 0 - No highlighting ; 1 - HTML strong tags ; 2 - Show highlighting regexp. and unhighlighted summary [dflt]; 5 - Use HTML strong tags but remove accents from summary before highlighting, provided query was not accented.
SM String - Summary mode (off;snip;debug;meta;qb;def;auto;both) - both means qb and meta.
SQE Integer 1 - 10000 Set max no. of query biased summary excerpts to n (dflt 3).
bb Boolean - If set, the query processor will may insert "best bets" (formerly known as "featured pages") suggestions from best_bets.cfg.
ctest_mode Integer 0 - 3 Controls behaviour of padre-sw when -ctest is used. 0: no internal evaluation; 1 - internal evaluation only. Output is brief plain text report of measures; 2 - internal evaluation only. Output in plain text with QBQ output followed by measures; 3 - internal evaluation plus normal CTOUT output in XML (with measures presented as comments)
explain Boolean - Explain rankings by showing score components. (Note that -explain=on turns off result set diversification).
explore Integer 7 - 50 Show 'explore' links against results. The value specifies how many terms to include in the expanded query.
gscoperesult String - Specifies the bit number that results will be set to in -res gscope or -res docnums modes (dflt 1).
mdsfhl Boolean - Are query terms only highlighted in MDSF metadata summaries
num_ranks Integer 0 - unlimited Limit number of results to n (min = 0, dflt = 10).
num_tiers Integer 0 - 50 Limit number of result list tiers to n (min = 0 (no ,limit), max = 50, dflt no limit)
qieval Float 0.000000 - 1.000000 Set the value presented for query independent evidence when using the qiecfg result format. (dflt 0.5).
qwhl String - Determines which parts of a search result are highlighted. S - snippet, M - metadata, U - URL, T - title. E.g. -qwhl=MUT
res String - Set result format. Possible values are: trec, mail, web, html, xml, urls, qiez, qieo, gscope, docnums, ctest, qiecfg or flcfg.
results_in_facet_categories Integer 0 - 100 Include the specified number of pre-computed search results under the rmc count element for metadata facet categories.
rmc_maxperfield Integer 0 - unlimited Set maximum number of RMC items to display per field at n (dflt 100).
rmc_sensitive Boolean - Treat facet categories (RMC items) case sensitively (default no). [Not CGI]
show_qsyntax_tree Boolean - Include an SVG representation of the query-as-processed in output.
start_rank Integer 1 - unlimited Present results starting from n (dflt 1).
tierbars Boolean - Display tierbars in result list output (XML and HTML). When turned on (for all -res modes) and -sort is used, results will be first sorted by tier then by the sorting mode, otherwise if -sortall is used then all results will be sorted regardless of tier.

G. Query interpretation options

Option Name Type Range Description
STOP String - Use the stoplist specified in <file> (one word per line) [Not CGI]
binary Integer 0 - 3 Determines whether or not binary documents are returned in the results. 0 - show all documents; 1 - show only binary documents; 2 - show only non-binary documents.
clive String - Dynamic metacollections. Specifies the number (from 0) of one component within the .sdinfo file(s) to make active.
daat_termination_type Integer 0 - 2 Selects how DAAT early exit is determined. 0 - try for d results with every metafield and every component; 1 - try for d results over every component but not necessarily every metafield; 2 - stop a soon as d results are obtained. (d is the parameter to -daat.)
daat_timeout Float 0.000000 - 3600.000000 Impose a soft timeout (in seconds) on the time taken by the DAAT machinery for one query. [Not CGI]
dont_estimate_full_matches Boolean - In DAAT mode don't guess the number of full matches when the DAAT depth did not let us processes an entire postings list.
events Boolean - Must be set if event search is to be used
fmo Boolean - Present full matches only.
lang String - If a 2-character language code is specified by this means, then stemmers etc specific to that language will be used, IF AVAILABLE. It is also permissible to use a 5-character code like en_GB, but padre behaviour will be the same as for en. Specifying lang also makes title and metadata sorting of results locale-specific, however support for this on Windows platforms is limited and problematic.
loose Integer 0 - unlimited Phrase looseness in words (min = 0, dflt = 0).
max_qbatch Integer 1 - unlimited Terminate batch query processing after the specified number of queries have been processed.
max_terms Integer 1 - unlimited Truncate queries after the specified number of terms. If the query is reordered, truncation will occur after reordering.
min_truncated_len Integer 0 - 20 The text part of a query term with a right truncation operator must have at least this length. E.g. if min_truncated_len were 4 funnel* would be accepted but fun* would be processed as fun. [Not CGI]
noexpired Boolean - Exclude expired docs from results. (Nullified by -zom) [Not CGI]
nulqok Boolean - An empty query submitted via CGI will be processed as a null query. The system query must be empty as well. (dflt is to ignore the request). [Not CGI]
phrase_prox_word_limit Integer 1 - unlimited Phrase or proximity terms with more than this number of words will be shortened by deleting words from the right. E.g. If this limit were 4 then `to be or not to be` would be processed as `to be or not` [Not CGI]
prox Integer 0 - unlimited Proximity limit in words (min = 0, dflt = 15).
qsup String - When blending queries, determines sources of supplementary queries to be tried, with corresponding weights assigned to each source (ranging from 0 to 1). No spaces. 'off' may be specified to disable supplementary queries. E.g. -qsup=SPEL/0.9+USUK/0.4+SYNS/0.1+LANG/0.1. Available sources are: SPEL (spelling suggestions); USUK (table of spelling differences between US and UK English); SYNS (synonyms as defined by the blending.cfg file); LANG (experimental German decompunding)
query_reorder Boolean - Reorder terms in query so that the most discriminating (least common) appear first. Often coupled with -max_terms=
ras Integer 0 - 2 Remove any stopwords from the query. Possible values: 0 - remove none; 1 - remove dynamically depending on the query; 2 - remove all stopwords (dflt 1).
service_volume String - Either 'high' or 'low'. A convenience setting to increase or reduce allowable query complexity and timeouts according to service volumes -- large or small indexes, high or low query volumes. [Not CGI]
stem Integer 0 - 3 Controls stemming of queries. 0 - do not stem (dflt); 1 - do not stem (replaces obsolete option); 2 - Stem all query words (light - English/French plural/singular only); 3 - Stem all query words(heavier).
stem_lconly Boolean - When stemming, stem only lowercase query words (to avoid stemming proper names and acronyms).
strip_invalid_utf8 Boolean - Normally, invalid UTF-8 characters are removed during indexing. If this hasn't happened. This option allows them to be removed from result packets.
synonyms Boolean - If set, the query processor will expand queries using thesaurus in synonyms.cfg.
truncation_allowed Integer 0 - 3 Enables the use of the * operator, binary valued, it is only valid in use with an option that disables DAAT mode such as, -service_volume='lo' or -daat=0. When applied all contexts are available such as: *:funnelback, funnel*, *back, and *:*elba*. [Not CGI]
wildcard_thresh Integer 0 - unlimited If the postings list for a term is longer than the specified value (in MB) it will be treated as a wildcard.
zom Boolean - Include docs in results even if noindex or killed.

H. Query source options

Option Name Type Range Description
ctest String - Read a batch of queries from testfile (in C_TEST format). Sets output format to RM_CTEST, but that may be overridden. (See es.csiro.au/C-TEST/ for information about C-TEST.) [Not CGI]
s String - System-generated query inserted behind the scenes by a form or front-end.

I. Quicklinks options

Option Name Type Range Description
QL Integer 0 - 5 Activate QuickLinks facility for default pages down to the specified level. 0 - off; 1 - server root pages; 2 - next level down.
QL_rank Integer 1 - unlimited If QuickLinks capability is active, show quick links for search results down to the specified rank.
QL_rank_is_relative Boolean - If true, the value of QL_rank will be interpreted relative to the start_rank. E.g. if QL_rank=2, the first two results on each page may show QuickLinks.

J. Ranking options

Option Name Type Range Description
SSS Integer 0 - 10 Same site suppression depth: 0 - no suppression (dflt for non-web collections.); 2 - hosts and their top level dir's (dflt for web and meta collections; 10 - special meaning for big Web applications.
SameSiteSuppressionExponent Float 0.000000 - unlimited Same site suppression penalty exponent (dflt 0.5, recommended range 0.2 - 0.7).
SameSiteSuppressionOffset Integer 0 - 1000 Number of additional documents from a site beyond the first that are allowed their full score before applying a same site suppression penalty (dflt 0)
absscores Boolean - Report content scores as % of max possible Okapi score (Intended for use with -vsimple=on).
anniemode Integer 0 - 3 Control the use of annotation indexes. 0 - do not use annotation indexes ; 1 - Process queries using annotation indexes only; 2 - Process queries using annotation indexes, falling back to normal indexes if insufficient results. (Most query op.s stripped.) 3 - Process queries using both annotation and normal indexes (Most operators stripped from queries.). Default 0.
b Float 0.000000 - unlimited Set Okapi b to f (dflt 0.75)
cgscope1 String - Documents matching this gscope expression (reverse Polish) can be upweighted with -cool68. Those not matching can be upweighted with -cool.70.
cgscope2 String - Documents matching this gscope expression (reverse Polish) can be upweighted with -cool69. Those not matching can be upweighted with -cool.71.
cool Boolean - Whether to use topic distillation scoring (cool and cooler). Dflt on.
cool.<Key> Key/Value pair - cool.N=V Set a value for the Nth tune parameter. See cooler ranking options for possible values of N.
daat Integer 0 - 10000000 Specifies the maximum number of full matches for Document-At-A-Time processing. If set to 0, Term-At-A-Time is used instead (dflt 5000).
diversity_rank_limit Integer 10 - unlimited Diversification won't alter ranking beyond rank n (default 200, min 10).
gscope1 String - Present only results whose gscope bits match reverse Polish expression e (Bits numbered from zero). If set to 'off', disable any previous expression.
k1 Float 0.000000 - unlimited Set Okapi K1 to <f>. (dflt 2.0)
kmod Integer 0 - 1 Select special scoring function i for special fields. 0 = normal, 1 = AF1 (dflt 1).
lscope String - Present only results whose URL matches a sort-of left-anchored pattern.
lscorrect Boolean - Whether to correct link scores across meta collection components (default yes).
main_homepage_factor Float 0.000000 - 1.000000 Penalise score of the homepage of a single-entity-controlled domain to prevent over representation in results sets. E.g. www.anu.edu.au/ in an index of ANU.
meta_suppression_field String - If same_meta_suppression is activated, the specified metadata field will be the field to which it applies. Only one metadata field can be treated in this way.
near_dup_factor Float 0.000000 - 1.000000 The query processor will penalise a result which is a near-duplicate of a previous result by multiplying by the factor specified. The penalty stiffens with more repetition.
neardup Float 0.000000 - 1.000000 Near dupulicates in ranking are multiplied by f. f=1 turns off near-dup detection.
promote_urls String - Insert the specified URLs at or near the top of the results list for a query. Value is a space separated list of URLs. URLs must correspond to those recorded by padre-iw. (dflt Inactive)
quanta Integer 10 - 100000 Set the number of possible score quantisation levels for each cool variable. In general, a high number should give more accurate ranking but may slow query processing.
rank_limit Integer 10 - unlimited Limit highest rank requestable to n (dflt 1,000,000,000).
ranking_profile Integer 0 - 100 Choose a profile of settings for the ranking function. 0 - current default; 1 - Standard BM25; 2 - Traditional (pre-12.0) Funnelback. Setting a profile does not override explicit settings. [Not CGI]
recency_decay_vals String - <z,w,m,y,d,c,m> - Define how recency scores decay with time. z w, m, y, d, c, m are floats in the range 0 - 1, which specify the recency score assigned to documents, 0 days, 1 wk, 1 mth, 1 yr, 1 dec, 1 cen, 1 mill. old. (dflt 1.0,0.75,0.5,0.25,0.025,0.0025) Recency scores between key values linearly interpolated. Past the millennium, recency scores are 1/daysold.
reference_date String - If specified, recency is based on this date rather than that of most recent doc. Format is <yyyymmdd>, or 'today'.
remove_urls String - Prevent the specified URLs from appearing in the results for a query. Value is a space separated list of URLs. URLs must correspond to those recorded by padre-iw. (dflt Inactive)
repetitiousness_factor Float 0.000000 - 1.000000 Penalise a repetitious result by multiplying by the factor specified. (Repetitiousness may involve same-site, same component or repeated metadata.) The penalty stiffens with more repetition.
same_collection_suppression Float 0.000000 - 1.000000 While searching a meta-collection, penalise the second result from the same primary collection as a previous result by multiplying by the factor specified. The penalty stiffens with more repetition.
same_meta_suppression Float 0.000000 - 1.000000 Penalise the second result with the same value for a specified metafield as a previous result by multiplying by the factor specified. The penalty stiffens with more repetition.
sco String - <n>[<classes>] Set doc scoring mode to n, using the classes specified. Most common values: 0 - score using doc text only ; 1 - no scoring. Produce an unordered set of results ; 2 - score using anchortext and URLs as well, upweight titles (or whatever fields are configured with -specf). For example to automaticall look in fields 'u' and 'v' for the query terms set -sco=2[u,v]
scope String - Present only results whose URL satisfies the include/exclude scopes included in list (comma separated). e.g. -scope=anu.edu.au,-anu.edu.au/archives
sort String - Sort top results by <string>. Possible values: 'date', 'adate' (ascending date), 'title', 'dtitle' (descending title), 'size' (file size), 'dsize' (descending filesize), 'url', 'durl' (descending url), 'coll' (collection name, then score), 'dcoll' (descending collection name, then score), 'meta<f>' (by metadata field f, then score),'dmeta<f>' (descending metadata field d, then score), 'shuffle' (random to avoid bias), 'collapse_count' (to order by the number of collapsed documents, with the largest collapsed set first), 'acollapse_count' (with the largest collapsed set last), 'prox' (for geo search: Sort top results by proximity to origin), 'dprox' (for geo search: Sort top results by descending proximity to origin). (dflt is case-insensitive for title and meta)
sort_sensitive Boolean - Use case-sensitive sorting when sorting results by title or metadata strings.
sortall Boolean - Include partial matches in the resorting performed by -sort.
specf String - Fields listed in string s, as a list of comma separated fields surrounded by square brackets, will be scored specially and added to query when using the -sco=2 mode (dflt '[k,K]').
sss_defeat_pattern String - URLs matching the specified pattern (currently a simple string match) will not be subject to samesite suppression.
static_cool_exponent Float 0.000000 - 1.000000 Control the extent to which static scores are attenuated with length of query. 0 => no attenuation; 1 => max attenuation. Attenuation by len ** -f.
title_dup_factor Float 0.000000 - 1.000000 The query processor will penalise a result which has exactly the same title as a previous result by multiplying by the factor specified. The penalty stiffens with more repetition.
unknown_daysold Integer 0 - unlimited A doc with unknown date is assumed to be d days old (for recency calcs) (dflt 366).
use_Paik Boolean - Use the tf.idf scheme proposed by Jiaul Paik at SIGIR 2013 rather than the more conventional BM25 variant.
use_secds Boolean - When working with domain-importance features in ranking, use SECDs if value is on, and raw domain names otherwise.
vsimple String - Very simple ranking. If set to 'on', equivalent to -sco=0 -cool=off -SSS=0 -kmod=0.
weight_only_fields String - Documents will not be retrieved in DAAT mode if they only match unfielded query terms in one or more of the implicit fields listed here. For example, specifying '[K,k]' will stop the query 'Monica Lewinski' matching a document solely because of click data or referring anchortext.
wmeta.<Key> Key/Value pair - wmeta.C=F Set upweighting factors for metadata class scoring. C - metadata class; F - weight to set. (dflt 0.5 for 'k' and 'K', 1 for everything else).
xscope String - Present only results whose URL exactly matches the provided URL (after canonicalisation).

K. Result collapsing options

Option Name Type Range Description
collapsing Boolean - Activate collapsing. Collapsing will be based on document content ('$') unless a collapsing_sig value is specified. Note that use of this option will disable result set diversification.
collapsing_SF String - Metadata fields to include in display for collapsed documents (assuming collapsing_num_ranks is non-zero). (dflt no fields). To view metadata fields 'id' and 'a' set this to '[id,a]'.
collapsing_label String - Label to indicate why items have been collapsed. (dflt "which are very similar")
collapsing_num_ranks Integer 0 - 1000 Specify how many collapsed results are to be shown under the uncollapsed ones. (dflt 0)
collapsing_scoped Boolean - Scope to only documents which have been collapsed on. Default is off.
collapsing_sig String - The collapsing_control segment to use when collapsing. E.g. "[a,p]", collapse on author+publisher. The value must correspond to one segment of the indexing.collapse_fields string. (A segment is a comma separated list of fields surrounded by square brackets) (dflt '[$]' (Collapsing on document content.))

L. Security options

Option Name Type Range Description
dls_internal_test Integer 0 - unlimited This allows testing of the padre side of the custom document level security mechanism. There is no call out to an external function. The value is interpreted as a combination of bits: 1 bit - dls_internal_test is active/not active; 2 bit - selects whether MINRESULTS mode is used or not. During internal testing, every odd numbered document in the original ranking is arbitrarily treated as inaccessible.
ipreject String - <queryLimit>,<windowSeconds>,<upperQueryLimit> - Use an ip rejector to limit requests from a single machine. Allow <queryLimit> queries per <windowsSeconds>, don't record more than <upperQueryLimit> queries. [Not CGI]
ldLibraryPath String - Full path to security plugin library [Not CGI]
locking_model String - Name of locking model, either "trim" or "sharepoint". [Not CGI]
no_security Boolean - Disable DLS, available as a command line option. [Not CGI]
secPlugin String - Name of security plugin library [Not CGI]
secPluginScript String - Name of security plugin script [Not CGI]
userkeys String - Conduct this search with security keys specified by s. [Not CGI]

M. Spelling options

Option Name Type Range Description
spelling Boolean - Activate spelling suggestion mechanism.
spelling_alpha Float 0.000000 - 1.000000 Set the weighting between 'closeness to the query' and support in the collection for a candidate suggestion. Big alpha, high weight on closeness to the query.
spelling_blend_thresh Float 0.000000 - 1.000000 Confidence threshold for automatically blending results for a query suggestion with those from the user's original query.
spelling_difflen_thresh Integer 0 - 1000 Don't make suggestions more than i characters longer or shorter than query.
spelling_dym_thresh Float 0.000000 - 1.000000 Confidence threshold for making a 'Did you mean' suggestion.
spelling_edist_constant Float 0.000000 - 1000.000000 Don't make suggestions whose edit distance from the query exceeds f + query_length * spelling_edist_proportion
spelling_edist_proportion Float 0.000000 - 1.000000 Don't make suggestions whose edit distance from the query exceeds spelling_edist_constant + query_length * f (0<=f<=1)
spelling_fullmatch_trigger_const Float 0.000000 - unlimited Don't look for suggestions if there are at least f * log10(num docs) full matches.
spelling_fullmatch_trigger_const Float 0.000000 - inf Don't look for suggestions if there are at least f * log10(num docs) full matches.
spelling_include_context Boolean - Include the non-corrected part of the query in the suggestion link.
spelling_min_querylen Integer 1 - 1000 Suggestions not made for queries shorter than this.
spelling_wt_thresh Float 0.000000 - 100.000000 Don't make suggestions whose weight is less than this. Weight is complex to explain, sorry.

N. TREC specific options

Option Name Type Range Description
trec_runid String - For TREC participation: Each result in TREC format will include this runid.
trec_topic Integer 0 - unlimited For TREC participation: The first query in a batch will get this topic number. Each new query will increase the number by one.
trecids Boolean - For TREC participation: Each result in TREC format will use the TREC docno rather than a URL

Default value


That is, no additional options.


Query processor set to perform Same Site Suppression, along with using synonyms:

 query_processor_options=-SSS=2 -THS=/home/search/conf/myCollectionName/synonyms.cfg

Query processor set to sort the top results by filesize:


Query processor set to upweight title text to twice the original weight, sort all results by date and use word stemming when searching for results:

 query_processor_options=-wmeta.t=2.0 -sort=date -sortall -stem=2

See also