Padre query processor options
Background
This specifies configuration options that can be supplied to the query processor via the query_processor_options configuration option. The PArallel Document Retrieval Engine (PADRE) query processor is a powerful engine that can be finely controlled through a large list of options that can be given to it. Often these options can be specified in this collection configuration parameter, or as a CGI parameter passed with the search request URL. The list of options available is given here.
Notes
-
The CGI parameter for a query processor option will have the same name e.g. for the
collapsing
query processor option you would specifycollapsing=on
in your CGI request. -
If an option is of type boolean then valid values for this are
on
oroff
. -
Query processing will not occur if the query processor is given an invalid option.
-
Query processor options can affect Funnelback’s speed and result quality, so change them with caution.
-
Numerical metadata search is currently only accessible using CGI parameters and not as query processor options.
A. Contextual navigation options
-categorise_clusters=<boolean>
-
Whether contextual navigation suggestions are grouped by type.
-cnto=<float>
Range: 0.000000 - unlimited-
Set contextual navigation time-out to s seconds (s floating point). processing may be omitted entirely if elapsed time for a query already exceeds s seconds. (dflt 1.0).
-contextual_navigation=<boolean>
-
Whether or not to activate the contextual navigation system.
-contextual_navigation_fields=<string>
-
String s lists the metadata fields, separated by commas surrounded by square brackets, to scan for contextual navigation suggestions. (dflt '[c,t]'). Note that scanning of document text can be suppressed by including a minus, for example '[-,c,t]'.
-max_phrase_length=<integer>
Range: 3 - 7-
Maximum length (in words) of contextual navigation suggestions.
-max_phrases=<integer>
Range: 0 - unlimited-
After this number of candidate phrases have been checked, contextual navigation processing will stop.
-max_results_to_examine=<integer>
Range: 0 - 200-
Maximum number of search results to scan for contextual navigation suggestions.
-site_max_clusters=<integer>
Range: 0 - unlimited-
Maximum number of site clusters to present in contextual navigation.
-topic_max_clusters=<integer>
Range: 0 - unlimited-
Maximum number of topic clusters to present in contextual navigation.
-type_max_clusters=<integer>
Range: 0 - unlimited-
Maximum number of type clusters to present in contextual navigation.
B. Geospatial options
-geospatial_ranges=<boolean>
-
Calculate geospatial distance from origin and bounding box ranges when geospatial data is configured and available.
-maxdist=<float>
Range: 0.000000 - unlimited-
Exclude results not within <f> km of origin.
-origin=<string>
-
<lat,long> Set origin to lat, long (floating point degrees).
C. Informational options
-canq=<boolean>
-
Write reordered queries to log. (dflt off)
-countIndexedTerms=<string>
[Not CGI]-
Metadata fields to have their indexed terms counted in the result set (DAAT only). Unlike rmcf multiple term occurrences in a single document are counted e.g. if metadata 'author' has 'Bob Ada|Bob|Bob' in two documents the resulting counts would be 'Ada': 2, 'Bob': 6. As this counts indexed terms long terms may be truncated depending on the indexer options used. To count fields 'a' and 'c', set this to '[a,c]'.
-countUniqueByGroup=<string>
[Not CGI]-
Counts the number of unique metadata values grouped by another metadata. Syntax: -countUniqueByGroup=[classToCount]:[groupBy],[classToCount]:[groupBy]. Example: -countUniqueByGroup=[author]:[project] would show us the number of authors contributing to each project. classToCount is a regex and will be expanded to all matching metadata classes e.g. [author.*]:[project] might expand to -countUniqueByGroup=[author]:[project],[authors]:[project].
-count_dates=<string>
-
Report facet counts for dates such as 'today', 'last week', 'this year'. Note that date categories may overlap. Only value currently supported is 'd'.
-count_urls=<integer>
[Not CGI]-
Display counts of results grouped by the URL path (Up to depth i). If <I> is 0, then the default value is used. Dflt 5. If <I> is not present count urls is turned off.
-docsPerColl=<boolean>
-
Show the number documents each collection contributed to the result set.
-rmcf=<string>
-
Metadata fields to have their words counted in result sets (fields representing facets). If metadata 'author' has 'Bob Ada|Bob|Bob' in two documents the counts would be 'Bob Ada': 2 'Bob': 2. To count fields 'a' and 'c', set this to '[a,c]'.
-rmrf=<string>
-
Numerical and geospatial fields listed will have their ranges calculated in result sets. To see the ranges of field 'height' and the bounding box geospatial field 'X' set this to '[height,X]'.
-showtimes=<boolean>
-
Print elapsed times for each stage of query processing.
-sum=<string>
[Not CGI]-
The sum of a numeric metadata in result set. Syntax: -sum=[sumOn],[sumOn]. Example: -sum=[size] would sum all values of numeric metadata 'size' in the result set. Note sumOn my be a regex which expands sumOn to all matching metadata classes e.g. -sum[size.*] might expanded to -sum=[sizeInKb],[sizeLoc].
-sumByGroup=<string>
[Not CGI]-
The sum of a numeric metadata by a group. Syntax: -sumByGroup=[sumOn]:[groupBy],[sumOn]:[groupBy]. Example: -sumByGroup=[size]:[project] would sum all values of numeric metadata 'size' grouped by 'project' giving output project 'Foo' has size '128', project 'Bar' has size '12'. Note sumOn my be a regex which expands sumOn to all matching metadata classes e.g. -sumByGroup[size.*]:[project] might expanded to -sumByGroup=[sizeInKb]:[project],[sizeLoc]:[project].
D. Logging options
-ip_to_log=<string>
-
What form of ip to include in log files: (nothing|ip|ip_hash|remote_user).
-log=<boolean>
[Not CGI]-
Write query log entries (dflt on).
-qlog_file=<string>
[Not CGI]-
If writing query log entries, write them to <FILE>.
-username=<string>
-
A string identifying the current user to be used in padre’s query log.
E. Miscellaneous options
-countgbits=<string>
-
s is either "all" or a comma-separated list of gscope bitnumbers for which counts are needed. (Bits numbered from zero.)
-exit_on_bad_component=<boolean>
-
Fail when a component has an incompatible index relative to the first (rather than skip).
-flock=<boolean>
-
Use flock when locking the query logfile. If set to no, lockf is used instead. Default on Solaris is 'no', all other systems 'yes'.
-mat=<integer>
Range: 0 - 2147 [Not CGI]-
Set matchset size to n million (dflt 24). Only need to increase on very large collections.
-ndt=<boolean>
[Not CGI]-
Don’t do tests on docs, e.g. phantom, zombie, *scope, binary, expired.
-unbuf=<boolean>
-
Don’t buffer the standard output stream. In some specific cases, setting this to 'no' can improve performance.
-view=<string>
-
The collection view the perform the query against when in CGI mode. Normally 'live' (default), 'offline' or 'snapshot###'.
F. Presentation options
-EORDER=<integer>
Range: 0 - 1-
Specify presentation order of query biased summary excerpts. 0: natural order in doc. 1: sorted by score. (dflt 0)
-MBL=<integer>
Range: 1 - unlimited-
Set buffer length per displayed metadata field to n bytes (dflt 250 bytes). Warning: setting very large values will increase query processor memory demands and may cause problems.
-SBL=<integer>
Range: 1 - unlimited-
Set summary buffer length to n bytes. (dflt 250 bytes)
-SF=<string>
-
Metadata fields to include in summaries. (if applicable). To include fields
author
andd
set this to[author,d]
. This option also supports regex to include all metadata classes set this to[.*]
to include fields prefixed withFun
and metadata classauthor
set[Fun.*,author]
. -SHLM=<integer>
Range: 0 - 7-
Select highlighting method within snippets in XML. 0 - No highlighting ; 1 - HTML strong tags ; 2 - Show highlighting regexp. and unhighlighted summary [dflt]; 5 - Use HTML strong tags but remove accents from summary before highlighting, provided query was not accented.
-SM=<string>
-
Summary mode. Possible values are 'both' (or 'def') - Display description or query-bias summary and metadata fields listed in the 'SF' option; 'snip' - display a generated snippet; 'meta' - display metadata fields listed in 'SF'.; 'qb' - display a query-biased summary; 'auto' - Print metadata codes if specified in user query.; 'off' - Turn off all summaries.
-SQE=<integer>
Range: 1 - 10000-
Set max no. of query biased summary excerpts to n (dflt 3).
-all_summary_text=<boolean>
-
Is text used for generating summaries required in the result
-countUniqueByGroupSensitive=<boolean>
[Not CGI]-
Treat group names and metadata items case sensitively (default no).
-ctest_mode=<integer>
Range: 0 - 3-
Controls behaviour of padre-sw when -ctest is used. 0: no internal evaluation; 1 - internal evaluation only. Output is brief plain text report of measures; 2 - internal evaluation only. Output in plain text with QBQ output followed by measures; 3 - internal evaluation plus normal CTOUT output in XML (with measures presented as comments)
-explain=<boolean>
-
Explain rankings by showing score components. (Note that -explain=on turns off result set diversification).
-explore=<integer>
Range: 7 - 50-
Show 'explore' links against results. The value specifies how many terms to include in the expanded query.
-gscoperesult=<string>
-
Specifies the bit number that results will be set to in -res gscope or -res docnums modes (dflt 1).
-mdsfhl=<boolean>
-
Are query terms only highlighted in MDSF metadata summaries
-num_ranks=<integer>
Range: 0 - unlimited-
Limit number of results to n (min = 0, dflt = 10).
-num_tiers=<integer>
Range: 0 - 50-
Limit number of result list tiers to n (min = 0 (no ,limit), max = 50, dflt no limit)
-qieval=<float>
Range: 0.000000 - 1.000000-
Set the value presented for query independent evidence when using the qiecfg result format. (dflt 0.5).
-qwhl=<string>
-
Determines which parts of a search result are highlighted. S - snippet, M - metadata, U - URL, T - title. E.g. -qwhl=MUT
-res=<string>
-
Set result format. Possible values are:
trec
,web
,xml
,urls
,qiez
,qieo
,gscope
,docnums
,ctest
,qiecfg
orflcfg
. Note: setting res to docnums, flcfg, gscope, qiecfg, qieo or qiez will override any num_ranks setting so that all results are returned. -results_in_facet_categories=<integer>
Range: 0 - 100-
Include the specified number of pre-computed search results under the rmc count element for metadata facet categories.
-rmc_maxperfield=<integer>
Range: 0 - unlimited-
Set maximum number of RMC items to display per field at n (dflt 100).
-rmc_sensitive=<boolean>
[Not CGI]-
Treat facet categories (RMC items) case sensitively (default no).
-show_qsyntax_tree=<boolean>
-
Include an SVG representation of the query-as-processed in output.
-start_rank=<integer>
Range: 1 - unlimited-
Present results starting from n (dflt 1).
-sumByGroupSensitive=<boolean>
[Not CGI]-
Treat group names case sensitively (default no).
-tierbars=<boolean>
-
Display tierbars in result list output (XML and HTML). When turned on (for all -res modes) and -sort is used, results will be first sorted by tier then by the sorting mode, otherwise if -sortall is used then all results will be sorted regardless of tier.
-translucent_DLS_fields=<string>
[Not CGI]-
Metadata fields which are translucent. Translucent fields are visible on documents which the user can not see. To include fields 'a' and 'd' set this to '[a,d]'. If collapsing is enabled and the collapsing signature contains only fields defined here than collapsing will be permitted on documents the user can not see.
G. Query interpretation options
-STOP=<string>
[Not CGI]-
Use the stoplist specified in <file> (one word per line)
-binary=<integer>
Range: 0 - 3-
Determines whether or not binary documents are returned in the results. 0 - show all documents; 1 - show only binary documents; 2 - show only non-binary documents.
-clive=<string>
-
Dynamic metacollections. Specifies a component name within a .sdinfo file(s) to make active. Can be set multiple times to enable multiple collections.
-daat_termination_type=<integer>
Range: 0 - 2-
Selects how DAAT early exit is determined. 0 - try for d results with every metafield and every component; 1 - try for d results over every component but not necessarily every metafield; 2 - stop a soon as d results are obtained. (d is the parameter to -daat.)
-daat_timeout=<float>
Range: 0.000000 - 3600.000000 [Not CGI]-
Impose a soft timeout (in seconds) on the time taken by the DAAT machinery for one query.
-dont_estimate_full_matches=<boolean>
-
In DAAT mode don’t guess the number of full matches when the DAAT depth did not let us processes an entire postings list.
-events=<boolean>
-
Must be set if event search is to be used
-fmo=<boolean>
-
Present full matches only.
-lang=<string>
-
If a 2-character language code is specified by this means, then stemmers etc specific to that language will be used, IF AVAILABLE. It is also permissible to use a 5-character code like en_GB, but padre behaviour will be the same as for en. Specifying lang also makes title and metadata sorting of results locale-specific, however support for this on Windows platforms is limited and problematic.
-loose=<integer>
Range: 0 - unlimited-
Phrase looseness in words (min = 0, dflt = 0).
-max_qbatch=<integer>
Range: 1 - unlimited-
Terminate batch query processing after the specified number of queries have been processed.
-max_terms=<integer>
Range: 1 - unlimited-
Truncate queries after the specified number of terms. If the query is reordered, truncation will occur after reordering.
-min_truncated_len=<integer>
Range: 0 - 20 [Not CGI]-
The text part of a query term with a right truncation operator must have at least this length. E.g. if min_truncated_len were 4 funnel* would be accepted but fun* would be processed as fun.
-noexpired=<boolean>
[Not CGI]-
Exclude expired docs from results. (Nullified by -zom)
-nulqok=<boolean>
[Not CGI]-
An empty query submitted via CGI will be processed as a null query. The system query must be empty as well. (dflt is to ignore the request).
-phrase_prox_word_limit=<integer>
Range: 1 - unlimited [Not CGI]-
Phrase or proximity terms with more than this number of words will be shortened by deleting words from the right. E.g. If this limit were 4 then
to be or not to be
would be processed asto be or not
-prox=<integer>
Range: 0 - unlimited-
Proximity limit in words (min = 0, dflt = 15).
-qsup=<string>
-
When blending queries, determines sources of supplementary queries to be tried, with corresponding weights assigned to each source (ranging from 0 to 1). No spaces. 'off' may be specified to disable supplementary queries. E.g. -qsup=SPEL/0.9+USUK/0.4+SYNS/0.1+LANG/0.1. Available sources are: SPEL (spelling suggestions); USUK (table of spelling differences between US and UK English); SYNS (synonyms as defined by the blending.cfg file); LANG (experimental German decompunding)
-query_reorder=<boolean>
-
Reorder terms in query so that the most discriminating (least common) appear first. Often coupled with -max_terms=
-ras=<integer>
Range: 0 - 2-
Remove any stopwords from the query. Possible values: 0 - remove none; 1 - remove dynamically depending on the query; 2 - remove all stopwords (dflt 1).
-service_volume=<string>
[Not CGI]-
Either 'high' or 'low'. A convenience setting to increase or reduce allowable query complexity and timeouts according to service volumes — large or small indexes, high or low query volumes.
-stem=<integer>
Range: 0 - 3-
Controls stemming of queries. 0 - do not stem (dflt); 1 - do not stem (replaces obsolete option); 2 - Stem all query words (light - English/French plural/singular only); 3 - Stem all query words(heavier).
-stem_lconly=<boolean>
-
When stemming, stem only lowercase query words (to avoid stemming proper names and acronyms).
-strip_invalid_utf8=<boolean>
-
Normally, invalid UTF-8 characters are removed during indexing. If this hasn’t happened. This option allows them to be removed from result packets.
-synonyms=<boolean>
-
If set, the query processor will expand queries using thesaurus in synonyms.cfg.
-truncation_allowed=<integer>
Range: 0 - 3 [Not CGI]-
Enables the use of the * operator, binary valued, it is only valid in use with an option that disables DAAT mode such as, -service_volume='lo' or -daat=0. When applied all contexts are available such as: :funnelback, funnel, back, and *:*elba.
-wildcard_thresh=<integer>
Range: 0 - unlimited-
If the postings list for a term is longer than the specified value (in MB) it will be treated as a wildcard.
-zom=<boolean>
-
Include docs in results even if noindex or killed.
H. Query source options
-ctest=<string>
[Not CGI]-
Read a batch of queries from testfile (in C_TEST format). Sets output format to RM_CTEST, but that may be overridden. (See es.csiro.au/C-TEST/ for information about C-TEST.)
-s=<string>
-
System-generated query inserted behind the scenes by a form or front-end.
I. Quicklinks options
-QL=<integer>
Range: 0 - 5-
Activate QuickLinks facility for default pages down to the specified level. 0 - off; 1 - server root pages; 2 - next level down.
-QL_rank=<integer>
Range: 1 - unlimited-
If QuickLinks capability is active, show quick links for search results down to the specified rank.
-QL_rank_is_relative=<boolean>
-
If true, the value of QL_rank will be interpreted relative to the start_rank. E.g. if QL_rank=2, the first two results on each page may show QuickLinks.
J. Ranking options
-SameSiteSuppressionExponent=<float>
Range: 0.000000 - unlimited-
Same site suppression penalty exponent (dflt 0.5, recommended range 0.2 - 0.7).
-SameSiteSuppressionOffset=<integer>
Range: 0 - 1000-
Number of additional documents from a site beyond the first that are allowed their full score before applying a same site suppression penalty (dflt 0)
-absscores=<boolean>
-
Report content scores as % of max possible Okapi score (Intended for use with -vsimple=on).
-anniemode=<integer>
Range: 0 - 3-
Control the use of annotation indexes. 0 - do not use annotation indexes ; 1 - Process queries using annotation indexes only; 2 - Process queries using annotation indexes, falling back to normal indexes if insufficient results. (Most query op.s stripped.) 3 - Process queries using both annotation and normal indexes (Most operators stripped from queries.). Default 0.
-b=<float>
Range: 0.000000 - unlimited-
Set Okapi b to f (dflt 0.75)
-cgscope1=<string>
-
Documents matching this gscope expression (reverse Polish) can be upweighted with -cool.68. Those not matching can be upweighted with -cool.70.
-cgscope2=<string>
-
Documents matching this gscope expression (reverse Polish) can be upweighted with -cool.69. Those not matching can be upweighted with -cool.71.
-cool=<boolean>
-
Whether to use topic distillation scoring (cool and cooler). Dflt on.
-cool.<Key>=<key/value pair>
-
cool.N=V Set a value for the Nth tune parameter. See cooler ranking options page for possible values of N.
-daat=<integer>
Range: 0 - 10000000-
Specifies the maximum number of full matches for Document-At-A-Time processing. If set to 0, Term-At-A-Time is used instead (dflt 5000).
-diversity_rank_limit=<integer>
Range: 10 - unlimited-
Diversification won’t alter ranking beyond rank n (default 200, min 10).
-facet_url_prefix=<string>
[Not CGI]-
Present only results whose URL is prefixed by the given URL. Note that the scheme and hostname part are case insensitive, for URI with scheme smb:// the entire prefix is case insensitive. The behaviour of this option may change in the future to suit facets, this should not be used outside of faceted navigation.
-gscope1=<string>
-
Present only results whose gscope bits match reverse Polish expression
e
(Bits numbered from zero). If set tooff
, disable any previous expression. -k1=<float>
Range: 0.000000 - unlimited-
Set Okapi K1 to <f>. (dflt 2.0)
-kmod=<integer>
Range: 0 - 1-
Select special scoring function i for special fields. 0 = normal, 1 = AF1 (dflt 1).
-lscope=<string>
-
Present only results whose URL matches a sort-of left-anchored pattern.
-lscorrect=<boolean>
-
Whether to correct link scores across meta collection components (default yes).
-main_homepage_factor=<float>
Range: 0.000000 - 1.000000-
Penalise score of the homepage of a single-entity-controlled domain to prevent over representation in results sets. E.g. www.anu.edu.au/ in an index of ANU. (dflt 0.90)
-meta_suppression_field=<string>
-
If same_meta_suppression is activated, the specified metadata field will be the field to which it applies. Only one metadata field can be treated in this way.
-near_dup_factor=<float>
Range: 0.000000 - 1.000000-
The query processor will penalise a result which is a near-duplicate of a previous result by multiplying by the factor specified. The penalty stiffens with more repetition. (dflt 0.5)
-promote_urls=<string>
-
Insert the specified URLs at or near the top of the results list for a query. Value is a space separated list of URLs. URLs must correspond to those recorded by padre-iw. (dflt Inactive)
-quanta=<integer>
Range: 10 - 100000-
Set the number of possible score quantisation levels for each cool variable. In general, a high number should give more accurate ranking but may slow query processing.
-rank_limit=<integer>
Range: 10 - unlimited-
Limit highest rank requestable to n (dflt 1,000,000,000).
-ranking_profile=<integer>
Range: 0 - 100 [Not CGI]-
Choose a profile of settings for the ranking function. 0 - current default; 1 - Standard BM25; 2 - Traditional (pre-12.0) Funnelback. Setting a profile does not override explicit settings.
-recency_decay_vals=<string>
-
<z,w,m,y,d,c,m> - Define how recency scores decay with time. z w, m, y, d, c, m are floats in the range 0 - 1, which specify the recency score assigned to documents, 0 days, 1 wk, 1 mth, 1 yr, 1 dec, 1 cen, 1 mill. old. (dflt 1.0,0.75,0.5,0.25,0.025,0.0025) Recency scores between key values linearly interpolated. Past the millennium, recency scores are 1/daysold.
-reference_date=<string>
-
If specified, recency is based on this date rather than that of most recent doc. Format is <yyyymmdd>, or 'today'.
-remove_urls=<string>
-
Prevent the specified URLs from appearing in the results for a query. Value is a space separated list of URLs. URLs must correspond to those recorded by padre-iw. (dflt Inactive)
-sco=<string>
-
<n>[<classes>] Set doc scoring mode to n, using the classes specified. Most common values: 0 - score using doc text only ; 1 - no scoring. Produce an unordered set of results ; 2 - score using anchortext and URLs as well, upweight titles (or whatever fields are configured with -specf). For example to automatically look in fields 'u' and 'v' for the query terms set -sco=2[u,v]
-scope=<string>
-
Present only results whose URL satisfies the include/exclude scopes included in list (comma separated). e.g. -scope=anu.edu.au,-anu.edu.au/archives
-sort=<string>
-
Sort top results by <string>. Possible values: 'date', 'adate' (ascending date), 'title', 'dtitle' (descending title), 'size' (file size), 'dsize' (descending filesize), 'url', 'durl' (descending url), 'coll' (collection name, then score), 'dcoll' (descending collection name, then score), 'meta<f>' (by metadata field f, then score),'dmeta<f>' (descending metadata field d, then score), 'shuffle' (random to avoid bias), 'collapse_count' (to order by the number of collapsed documents, with the largest collapsed set first), 'acollapse_count' (with the largest collapsed set last), 'prox' (for geo search: Sort top results by proximity to origin), 'dprox' (for geo search: Sort top results by descending proximity to origin). 'score_ignoring_tiers' (descending score, ignoring any tiers. Only useful with sortall.) (dflt is case-insensitive for title and meta). '-sort=' turns off sorting.
-sort_sensitive=<boolean>
-
Use case-sensitive sorting when sorting results by title or metadata strings.
-sortall=<boolean>
-
Include partial matches in the resorting performed by -sort.
-specf=<string>
-
Fields listed in string s, as a list of comma separated fields surrounded by square brackets, will be scored specially and added to query when using the -sco=2 mode (dflt '[k,K]').
-sss_defeat_pattern=<string>
-
URLs matching the specified pattern (currently a simple string match) will not be subject to samesite suppression.
-static_cool_exponent=<float>
Range: 0.000000 - 1.000000-
Control the extent to which static scores are attenuated with length of query. 0 => no attenuation; 1 => max attenuation. Attenuation by len ** -f.
-unknown_daysold=<integer>
Range: 0 - unlimited-
A doc with unknown date is assumed to be d days old (for recency calcs) (dflt 366).
-use_Paik=<boolean>
-
Use the tf.idf scheme proposed by Jiaul Paik at SIGIR 2013 rather than the more conventional BM25 variant.
-use_secds=<boolean>
-
When working with domain-importance features in ranking, use SECDs if value is on, and raw domain names otherwise.
-vsimple=<string>
-
Very simple ranking. If set to 'on', equivalent to -sco=0 -cool=off -SSS=0 -kmod=0.
-weight_only_fields=<string>
-
Documents will not be retrieved in DAAT mode if they only match unfielded query terms in one or more of the implicit fields listed here. For example, specifying '[K,k]' will stop the query 'Monica Lewinski' matching a document solely because of click data or referring anchortext.
-wmeta.<Key>=<key/value pair>
-
wmeta.C=F Set upweighting factors for metadata class scoring. C - metadata class; F - weight to set. (dflt 0.5 for 'k' and 'K', 1 for everything else).
-xscope=<string>
-
Present only results whose URL exactly matches the provided URL (after canonicalization).
K. Ranking - Result diversification options
-SSS=<integer>
Range: 0 - 10-
Same site suppression depth: 0 - no suppression (dflt); 2 - hosts and their top level dir’s; 10 - org domain (includes sub-domains) e.g. defence.gov.au.
-neardup=<float>
Range: 0.000000 - 1.000000-
Near dupulicates in ranking are multiplied by f. Setting f to 1 turns off near-dup detection.
-repetitiousness_factor=<float>
Range: 0.000000 - 1.000000-
Penalise a repetitious result by multiplying by the factor specified. (Repetitiousness may involve same-site, same component or repeated metadata.) The penalty stiffens with more repetition. Setting to 1 turns this off. (dflt 1.0)
-same_collection_suppression=<float>
Range: 0.000000 - 1.000000-
While searching a meta-collection, penalise the second result from the same primary collection as a previous result by multiplying by the factor specified. The penalty stiffens with more repetition. Setting to 0 turns this off. (dflt 0)
-same_meta_suppression=<float>
Range: 0.000000 - 1.000000-
Penalise the second result with the same value for a specified metafield as a previous result by multiplying by the factor specified. The penalty stiffens with more repetition. Setting to 0 turns this off. (dflt 0)
-title_dup_factor=<float>
Range: 0.000000 - 1.000000-
The query processor will penalise a result which has exactly the same title as a previous result by multiplying by the factor specified. The penalty stiffens with more repetition. Setting to 1 turns this off. (dflt 0.5)
L. Result collapsing options
-collapsed_docs_sort=<string>
-
Sort collapsed results by <string>. Possible values: 'date', 'adate' (ascending date), 'title', 'dtitle' (descending title), 'size' (file size), 'dsize' (descending filesize), 'url', 'durl' (descending url), 'coll' (collection name, then score), 'dcoll' (descending collection name, then score), 'meta<f>' (by metadata field f, then score),'dmeta<f>' (descending metadata field d, then score), 'shuffle' (random to avoid bias), 'prox' (for geo search: Sort collapsed results by proximity to origin), 'dprox' (for geo search: Sort collapsed results by descending proximity to origin). 'score_ignoring_tiers' (descending score, ignoring any tiers. Only useful with sortall.)
-collapsing=<boolean>
-
Activate collapsing. Collapsing will be based on document content ('$') unless a collapsing_sig value is specified. Note that use of this option will disable result set diversification.
-collapsing_SF=<string>
-
Metadata fields to include in display for collapsed documents (assuming collapsing_num_ranks is non-zero). (dflt no fields). To view metadata fields 'id' and 'a' set this to '[id,a]'.
-collapsing_label=<string>
-
Label to indicate why items have been collapsed. (dflt "which are very similar")
-collapsing_num_ranks=<integer>
Range: 0 - 1000-
Specify how many collapsed results are to be shown under the uncollapsed ones. (dflt 0)
-collapsing_scoped=<boolean>
-
Scope to only documents which have been collapsed on. Default is off.
-collapsing_sig=<string>
-
The collapsing_control segment to use when collapsing. E.g. "[a,p]", collapse on author+publisher. The value must correspond to one segment of the indexing.collapse_fields string. (A segment is a comma separated list of fields surrounded by square brackets) (dflt '[$]' (Collapsing on document content.))
M. Security options
-dls_internal_test=<integer>
Range: 0 - unlimited-
This allows testing of the padre side of the custom document level security mechanism. There is no call out to an external function. The value is interpreted as a combination of bits: 1 bit - dls_internal_test is active/not active; 2 bit - selects whether MINRESULTS mode is used or not. During internal testing, every odd numbered document in the original ranking is arbitrarily treated as inaccessible.
-ipreject=<string>
[Not CGI]-
QUERY_LIMIT,WINDOW_SECONDS,UPPER_QUERY_LIMIT
- Use an IP rejector to limit requests from a single machine. AllowQUERY_LIMIT
queries perWINDOW_SECONDS
, don’t record more thanUPPER_QUERY_LIMIT
queries. -ldLibraryPath=<string>
[Not CGI]-
Full path to security plugin library
-locking_model=<string>
[Not CGI]-
Name of locking model, either "trim" or "sharepoint".
-no_security=<boolean>
[Not CGI]-
Disable DLS, available as a command line option.
-secPlugin=<string>
[Not CGI]-
Name of security plugin library
-translucent_DLS=<boolean>
[Not CGI]-
Enables translucent DLS DAAT only.
-userkeys=<string>
[Not CGI]-
Conduct this search with security keys specified by s. The format is '<collectionName>;key<delim>' where delim is either ',' or new line, spaces are removed for example 'c1;k1 c2;k1,c2;k2'
N. Spelling options
-spelling=<boolean>
-
Activate spelling suggestion mechanism.
-spelling_alpha=<float>
Range: 0.000000 - 1.000000-
Set the weighting between 'closeness to the query' and support in the collection for a candidate suggestion. Big alpha, high weight on closeness to the query. (dflt 0.7)
-spelling_blend_thresh=<float>
Range: 0.000000 - 1.000000-
Confidence threshold for automatically blending results for a query suggestion with those from the user’s original query. (dflt 0.67)
-spelling_difflen_thresh=<integer>
Range: 0 - 1000-
Don’t make suggestions more than i characters longer or shorter than query. (dflt 2)
-spelling_dym_thresh=<float>
Range: 0.000000 - 1.000000-
Confidence threshold for making a 'Did you mean' suggestion. (dflt 0.5)
-spelling_edist_constant=<float>
Range: 0.000000 - 1000.000000-
Don’t make suggestions whose edit distance from the query exceeds f + query_length * spelling_edist_proportion. (dflt 1)
-spelling_edist_proportion=<float>
Range: 0.000000 - 1.000000-
Don’t make suggestions whose edit distance from the query exceeds spelling_edist_constant + query_length * f (0⇐f⇐1). (dflt 0.25)
-spelling_fullmatch_trigger_const=<float>
Range: 0.000000 - unlimited-
Don’t look for suggestions if there are at least f * log10(num docs) full matches. (dflt 30.0)
-spelling_fullmatch_trigger_const=<float>
Range: 0.000000 - inf-
Don’t look for suggestions if there are at least f * log10(num docs) full matches. (dflt 30.0)
-spelling_include_context=<boolean>
-
Include the non-corrected part of the query in the suggestion link. (dflt 1)
-spelling_min_querylen=<integer>
Range: 1 - 1000-
Suggestions not made for queries shorter than this. (dflt 2)
-spelling_wt_thresh=<float>
Range: 0.000000 - 100.000000-
Sets a threshold that determines if a spelling suggestion is returned. If the generated spelling suggestion weight is less than this, the suggestion is not returned. (dflt 0.01)
O. TREC specific options
-trec_runid=<string>
-
For TREC participation: Each result in TREC format will include this runid.
-trec_topic=<integer>
Range: 0 - unlimited-
For TREC participation: The first query in a batch will get this topic number. Each new query will increase the number by one.
-trecids=<boolean>
-
For TREC participation: Each result in TREC format will use the TREC docno rather than a URL