Padre query processor options

Background

This specifies configuration options that can be supplied to the query processor via the query_processor_options configuration option. The PArallel Document Retrieval Engine (PADRE) query processor is a powerful engine that can be finely controlled through a large list of options that can be given to it. Often these options can be specified in this collection configuration parameter, or as a CGI parameter passed with the search request URL. The list of options available is given here.

Notes

  • The CGI parameter for a query processor option will have the same name e.g. for the collapsing query processor option you would specify collapsing=on in your CGI request.

  • If an option is of type boolean then valid values for this are on or off.

  • Query processing will not occur if the query processor is given an invalid option.

  • Query processor options can affect Funnelback’s speed and result quality, so change them with caution.

  • Numerical metadata search is currently only accessible using CGI parameters and not as query processor options.

  • Note these options had a major revamp in version 12 and 12.2. Please see the release notes for more details.

A. Contextual navigation options

-categorise_clusters=<boolean>

Whether contextual navigation suggestions are grouped by type.

-cnto=<float> Range: 0.000000 - unlimited

Set contextual navigation time-out to s seconds (s floating point). processing may be omitted entirely if elapsed time for a query already exceeds s seconds. (dflt 1.0).

-contextual_navigation=<boolean>

Whether or not to activate the contextual navigation system.

-contextual_navigation_fields=<string>

String s lists the metadata fields, separated by commas surrounded by square brackets, to scan for contextual navigation suggestions. (dflt '[c,t]'). Note that scanning of document text can be suppressed by including a minus, for example '[-,c,t]'.

-max_phrase_length=<integer> Range: 3 - 7

Maximum length (in words) of contextual navigation suggestions.

-max_phrases=<integer> Range: 0 - unlimited

After this number of candidate phrases have been checked, contextual navigation processing will stop.

-max_results_to_examine=<integer> Range: 0 - 200

Maximum number of search results to scan for contextual navigation suggestions.

-site_max_clusters=<integer> Range: 0 - unlimited

Maximum number of site clusters to present in contextual navigation.

-topic_max_clusters=<integer> Range: 0 - unlimited

Maximum number of topic clusters to present in contextual navigation.

-type_max_clusters=<integer> Range: 0 - unlimited

Maximum number of type clusters to present in contextual navigation.

B. Geospatial options

-geospatial_ranges=<boolean>

Calculate geospatial distance from origin and bounding box ranges when geospatial data is configured and available.

-maxdist=<float> Range: 0.000000 - unlimited

Exclude results not within <f> km of origin.

-origin=<string>

<lat,long> Set origin to lat, long (floating point degrees).

C. Informational options

-canq=<boolean>

Write reordered queries to log. (dflt off)

-countIndexedTerms=<string> [Not CGI]

Metadata fields to have their indexed terms counted in the result set (DAAT only). Unlike rmcf multiple term occurrences in a single document are counted e.g. if metadata 'author' has 'Bob Ada|Bob|Bob' in two documents the resulting counts would be 'Ada': 2, 'Bob': 6. As this counts indexed terms long terms may be truncated depending on the indexer options used. To count fields 'a' and 'c', set this to '[a,c]'.

-countUniqueByGroup=<string> [Not CGI]

Counts the number of unque metadata values grouped by another metadata. Syntax: -countUniqueByGroup=[classToCount]:[groupBy],[classToCount]:[groupBy]. Example: -countUniqueByGroup=[author]:[project] would show us the number of authors contributing to each project. classToCount is a regex and will be expanded to all matching metadata classes e.g. [autho.*]:[project] might exapand to -countUniqueByGroup=[author]:[project],[authors]:[project].

-count_dates=<string>

Report facet counts for dates such as 'today', 'last week', 'this year'. Note that date categories may overlap. Only value currently supported is 'd'.

-count_urls=<integer> [Not CGI]

Display counts of results grouped by the URL path (Up to depth i). If <I> is 0, then the default value is used. Dflt 5. If <I> is not present count urls is turned off.

-docsPerColl=<boolean>

Show the number documents each collection contributed to the result set.

-rmcf=<string>

Metadata fields to have their words counted in result sets (fields representing facets). If metadata 'author' has 'Bob Ada|Bob|Bob' in two documents the counts would be 'Bob Ada': 2 'Bob': 2. To count fields 'a' and 'c', set this to '[a,c]'.

-rmrf=<string>

Numerical and geospatial fields listed will have their ranges calculated in result sets. To see the ranges of field 'height' and the bounding box geospatial field 'X' set this to '[height,X]'.

-showtimes=<boolean>

Print elapsed times for each stage of query processing.

-sum=<string> [Not CGI]

The sum of a numeric metadata in result set. Syntax: -sum=[sumOn],[sumOn]. Example: -sum=[size] would sum all values of numeric metadata 'size' in the result set. Note somON my be a regex which expands sumOn to all matching metadata classes e.g. -sum[size.*] might expanded to -sum=[sizeInKb],[sizeLoc].

-sumByGroup=<string> [Not CGI]

The sum of a numeric metadata by a group. Syntax: -sumByGroup=[sumOn]:[groupBy],[sumOn]:[groupBy]. Example: -sumByGroup=[size]:[project] would sum all values of numeric metadata 'size' grouped by 'project' giving output project 'Foo' has size '128', project 'Bar' has size '12'. Note sumOn my be a regex which expands sumOn to all matching metadata classes e.g. -sumByGroup[size.*]:[project] might expanded to -sumByGroup=[sizeInKb]:[project],[sizeLoc]:[project].

D. Logging options

-ip_to_log=<string>

What form of ip to include in log files: (nothing|ip|ip_hash|remote_user).

-log=<boolean> [Not CGI]

Write query log entries (dflt on).

-qlog_file=<string> [Not CGI]

If writing query log entries, write them to <FILE>.

-username=<string>

A string identifying the current user to be used in padre’s query log.

E. Miscellaneous options

-countgbits=<string>

s is either "all" or a comma-separated list of gscope bitnumbers for which counts are needed. (Bits numbered from zero.)

-exit_on_bad_component=<boolean>

Fail when a component has an incompatible index relative to the first (rather than skip).

-flock=<boolean>

Use flock when locking the query logfile. If set to no, lockf is used instead. Default on Solaris is 'no', all other systems 'yes'.

-mat=<integer> Range: 0 - 2147 [Not CGI]

Set matchset size to n million (dflt 24). Only need to increase on very large collections.

-ndt=<boolean> [Not CGI]

Don’t do tests on docs, e.g. phantom, zombie, *scope, binary, expired.

-unbuf=<boolean>

Don’t buffer the standard output stream. In some specific cases, setting this to 'no' can improve performance.

-view=<string>

The collection view the perform the query against when in CGI mode. Normally 'live' (default), 'offline' or 'snapshot###'.

F. Presentation options

-EORDER=<integer> Range: 0 - 1

Specify presentation order of query biased summary excerpts. 0: natural order in doc. 1: sorted by score. (dflt 0)

-MBL=<integer> Range: 1 - unlimited

Set buffer length per displayed metadata field to n bytes (dflt 250 bytes). Warning: setting very large values will increase query processor memory demands and may cause problems.

-SBL=<integer> Range: 1 - unlimited

Set summary buffer length to n bytes. (dflt 250 bytes)

-SF=<string>

Metadata fields to include in summaries. (if applicable). To include fields 'a' and 'd' set this to '[a,d]'. This option also supports regex to include all metadata classes set this to '[.]' to include fields prefixed with 'Fun' and metadata class 'a' set '[Fun.,a]'.

-SHLM=<integer> Range: 0 - 7

Select highlighting method within snippets in XML. 0 - No highlighting ; 1 - HTML strong tags ; 2 - Show highlighting regexp. and unhighlighted summary [dflt]; 5 - Use HTML strong tags but remove accents from summary before highlighting, provided query was not accented.

-SM=<string>

Summary mode (off;snip;debug;meta;qb;def;auto;both) - both means qb and meta.

-SQE=<integer> Range: 1 - 10000

Set max no. of query biased summary excerpts to n (dflt 3).

-all_summary_text=<boolean>

Is text used for generating summaries required in the result

-countUniqueByGroupSensitive=<boolean> [Not CGI]

Treat group names and metadata items case sensitively (default no).

-ctest_mode=<integer> Range: 0 - 3

Controls behaviour of padre-sw when -ctest is used. 0: no internal evaluation; 1 - internal evaluation only. Output is brief plain text report of measures; 2 - internal evaluation only. Output in plain text with QBQ output followed by measures; 3 - internal evaluation plus normal CTOUT output in XML (with measures presented as comments)

-explain=<boolean>

Explain rankings by showing score components. (Note that -explain=on turns off result set diversification).

-explore=<integer> Range: 7 - 50

Show 'explore' links against results. The value specifies how many terms to include in the expanded query.

-gscoperesult=<string>

Specifies the bit number that results will be set to in -res gscope or -res docnums modes (dflt 1).

-mdsfhl=<boolean>

Are query terms only highlighted in MDSF metadata summaries

-num_ranks=<integer> Range: 0 - unlimited

Limit number of results to n (min = 0, dflt = 10).

-num_tiers=<integer> Range: 0 - 50

Limit number of result list tiers to n (min = 0 (no ,limit), max = 50, dflt no limit)

-qieval=<float> Range: 0.000000 - 1.000000

Set the value presented for query independent evidence when using the qiecfg result format. (dflt 0.5).

-qwhl=<string>

Determines which parts of a search result are highlighted. S - snippet, M - metadata, U - URL, T - title. E.g. -qwhl=MUT

-res=<string>

Set result format. Possible values are: trec, web, xml, urls, qiez, qieo, gscope, docnums, ctest, qiecfg or flcfg.

-results_in_facet_categories=<integer> Range: 0 - 100

Include the specified number of pre-computed search results under the rmc count element for metadata facet categories.

-rmc_maxperfield=<integer> Range: 0 - unlimited

Set maximum number of RMC items to display per field at n (dflt 100).

-rmc_sensitive=<boolean> [Not CGI]

Treat facet categories (RMC items) case sensitively (default no).

-show_qsyntax_tree=<boolean>

Include an SVG representation of the query-as-processed in output.

-start_rank=<integer> Range: 1 - unlimited

Present results starting from n (dflt 1).

-sumByGroupSensitive=<boolean> [Not CGI]

Treat group names case sensitively (default no).

-tierbars=<boolean>

Display tierbars in result list output (XML and HTML). When turned on (for all -res modes) and -sort is used, results will be first sorted by tier then by the sorting mode, otherwise if -sortall is used then all results will be sorted regardless of tier.

-translucent_DLS_fields=<string> [Not CGI]

Metadata fields which are translucent. Translucent fields are visible on documents which the user can not see. To include fields 'a' and 'd' set this to '[a,d]'. If collapsing is enabled and the collapsing signature contains only fields defined here than collapsing will be permitted on documents the user can not see.

G. Query interpretation options

-STOP=<string> [Not CGI]

Use the stoplist specified in <file> (one word per line)

-binary=<integer> Range: 0 - 3

Determines whether or not binary documents are returned in the results. 0 - show all documents; 1 - show only binary documents; 2 - show only non-binary documents.

-clive=<string>

Dynamic metacollections. Specifies a component name within a .sdinfo file(s) to make active. Can be set multiple times to enable multiple collections.

-daat_termination_type=<integer> Range: 0 - 2

Selects how DAAT early exit is determined. 0 - try for d results with every metafield and every component; 1 - try for d results over every component but not necessarily every metafield; 2 - stop a soon as d results are obtained. (d is the parameter to -daat.)

-daat_timeout=<float> Range: 0.000000 - 3600.000000 [Not CGI]

Impose a soft timeout (in seconds) on the time taken by the DAAT machinery for one query.

-dont_estimate_full_matches=<boolean>

In DAAT mode don’t guess the number of full matches when the DAAT depth did not let us processes an entire postings list.

-events=<boolean>

Must be set if event search is to be used

-fmo=<boolean>

Present full matches only.

-lang=<string>

If a 2-character language code is specified by this means, then stemmers etc specific to that language will be used, IF AVAILABLE. It is also permissible to use a 5-character code like en_GB, but padre behaviour will be the same as for en. Specifying lang also makes title and metadata sorting of results locale-specific, however support for this on Windows platforms is limited and problematic.

-loose=<integer> Range: 0 - unlimited

Phrase looseness in words (min = 0, dflt = 0).

-max_qbatch=<integer> Range: 1 - unlimited

Terminate batch query processing after the specified number of queries have been processed.

-max_terms=<integer> Range: 1 - unlimited

Truncate queries after the specified number of terms. If the query is reordered, truncation will occur after reordering.

-min_truncated_len=<integer> Range: 0 - 20 [Not CGI]

The text part of a query term with a right truncation operator must have at least this length. E.g. if min_truncated_len were 4 funnel* would be accepted but fun* would be processed as fun.

-noexpired=<boolean> [Not CGI]

Exclude expired docs from results. (Nullified by -zom)

-nulqok=<boolean> [Not CGI]

An empty query submitted via CGI will be processed as a null query. The system query must be empty as well. (dflt is to ignore the request).

-phrase_prox_word_limit=<integer> Range: 1 - unlimited [Not CGI]

Phrase or proximity terms with more than this number of words will be shortened by deleting words from the right. E.g. If this limit were 4 then to be or not to be would be processed as to be or not

-prox=<integer> Range: 0 - unlimited

Proximity limit in words (min = 0, dflt = 15).

-qsup=<string>

When blending queries, determines sources of supplementary queries to be tried, with corresponding weights assigned to each source (ranging from 0 to 1). No spaces. 'off' may be specified to disable supplementary queries. E.g. -qsup=SPEL/0.9+USUK/0.4+SYNS/0.1+LANG/0.1. Available sources are: SPEL (spelling suggestions); USUK (table of spelling differences between US and UK English); SYNS (synonyms as defined by the blending.cfg file); LANG (experimental German decompunding)

-query_reorder=<boolean>

Reorder terms in query so that the most discriminating (least common) appear first. Often coupled with -max_terms=

-ras=<integer> Range: 0 - 2

Remove any stopwords from the query. Possible values: 0 - remove none; 1 - remove dynamically depending on the query; 2 - remove all stopwords (dflt 1).

-service_volume=<string> [Not CGI]

Either 'high' or 'low'. A convenience setting to increase or reduce allowable query complexity and timeouts according to service volumes — large or small indexes, high or low query volumes.

-stem=<integer> Range: 0 - 3

Controls stemming of queries. 0 - do not stem (dflt); 1 - do not stem (replaces obsolete option); 2 - Stem all query words (light - English/French plural/singular only); 3 - Stem all query words(heavier).

-stem_lconly=<boolean>

When stemming, stem only lowercase query words (to avoid stemming proper names and acronyms).

-strip_invalid_utf8=<boolean>

Normally, invalid UTF-8 characters are removed during indexing. If this hasn’t happened. This option allows them to be removed from result packets.

-synonyms=<boolean>

If set, the query processor will expand queries using thesaurus in synonyms.cfg.

-truncation_allowed=<integer> Range: 0 - 3 [Not CGI]

Enables the use of the * operator, binary valued, it is only valid in use with an option that disables DAAT mode such as, -service_volume='lo' or -daat=0. When applied all contexts are available such as: :funnelback, funnel, back, and *:*elba.

-wildcard_thresh=<integer> Range: 0 - unlimited

If the postings list for a term is longer than the specified value (in MB) it will be treated as a wildcard.

-zom=<boolean>

Include docs in results even if noindex or killed.

H. Query source options

-ctest=<string> [Not CGI]

Read a batch of queries from testfile (in C_TEST format). Sets output format to RM_CTEST, but that may be overridden. (See es.csiro.au/C-TEST/ for information about C-TEST.)

-s=<string>

System-generated query inserted behind the scenes by a form or front-end.

-QL=<integer> Range: 0 - 5

Activate QuickLinks facility for default pages down to the specified level. 0 - off; 1 - server root pages; 2 - next level down.

-QL_rank=<integer> Range: 1 - unlimited

If QuickLinks capability is active, show quick links for search results down to the specified rank.

-QL_rank_is_relative=<boolean>

If true, the value of QL_rank will be interpreted relative to the start_rank. E.g. if QL_rank=2, the first two results on each page may show QuickLinks.

J. Ranking options

-SameSiteSuppressionExponent=<float> Range: 0.000000 - unlimited

Same site suppression penalty exponent (dflt 0.5, recommended range 0.2 - 0.7).

-SameSiteSuppressionOffset=<integer> Range: 0 - 1000

Number of additional documents from a site beyond the first that are allowed their full score before applying a same site suppression penalty (dflt 0)

-absscores=<boolean>

Report content scores as % of max possible Okapi score (Intended for use with -vsimple=on).

-anniemode=<integer> Range: 0 - 3

Control the use of annotation indexes. 0 - do not use annotation indexes ; 1 - Process queries using annotation indexes only; 2 - Process queries using annotation indexes, falling back to normal indexes if insufficient results. (Most query op.s stripped.) 3 - Process queries using both annotation and normal indexes (Most operators stripped from queries.). Default 0.

-b=<float> Range: 0.000000 - unlimited

Set Okapi b to f (dflt 0.75)

-cgscope1=<string>

Documents matching this gscope expression (reverse Polish) can be upweighted with -cool.68. Those not matching can be upweighted with -cool.70.

-cgscope2=<string>

Documents matching this gscope expression (reverse Polish) can be upweighted with -cool.69. Those not matching can be upweighted with -cool.71.

-cool=<boolean>

Whether to use topic distillation scoring (cool and cooler). Dflt on.

-cool.<Key>=<key/value pair>

cool.N=V Set a value for the Nth tune parameter. See cooler ranking options page for possible values of N.

-daat=<integer> Range: 0 - 10000000

Specifies the maximum number of full matches for Document-At-A-Time processing. If set to 0, Term-At-A-Time is used instead (dflt 5000).

-diversity_rank_limit=<integer> Range: 10 - unlimited

Diversification won’t alter ranking beyond rank n (default 200, min 10).

-facet_url_prefix=<string> [Not CGI]

Present only results whose URL is prefixed by the given URL. Note that the scheme and hostname part are case insensitive, for URI with scheme smb:// the entire prefix is case insensitive. The behaviour of this option may change in the future to suit facets, this should not be used outside of faceted navigation.

-gscope1=<string>

Present only results whose gscope bits match reverse Polish expression e (Bits numbered from zero). If set to 'off', disable any previous expression.

-k1=<float> Range: 0.000000 - unlimited

Set Okapi K1 to <f>. (dflt 2.0)

-kmod=<integer> Range: 0 - 1

Select special scoring function i for special fields. 0 = normal, 1 = AF1 (dflt 1).

-lscope=<string>

Present only results whose URL matches a sort-of left-anchored pattern.

-lscorrect=<boolean>

Whether to correct link scores across meta collection components (default yes).

-main_homepage_factor=<float> Range: 0.000000 - 1.000000

Penalise score of the homepage of a single-entity-controlled domain to prevent over representation in results sets. E.g. www.anu.edu.au/ in an index of ANU.

-meta_suppression_field=<string>

If same_meta_suppression is activated, the specified metadata field will be the field to which it applies. Only one metadata field can be treated in this way.

-near_dup_factor=<float> Range: 0.000000 - 1.000000

The query processor will penalise a result which is a near-duplicate of a previous result by multiplying by the factor specified. The penalty stiffens with more repetition.

-promote_urls=<string>

Insert the specified URLs at or near the top of the results list for a query. Value is a space separated list of URLs. URLs must correspond to those recorded by padre-iw. (dflt Inactive)

-quanta=<integer> Range: 10 - 100000

Set the number of possible score quantisation levels for each cool variable. In general, a high number should give more accurate ranking but may slow query processing.

-rank_limit=<integer> Range: 10 - unlimited

Limit highest rank requestable to n (dflt 1,000,000,000).

-ranking_profile=<integer> Range: 0 - 100 [Not CGI]

Choose a profile of settings for the ranking function. 0 - current default; 1 - Standard BM25; 2 - Traditional (pre-12.0) Funnelback. Setting a profile does not override explicit settings.

-recency_decay_vals=<string>

<z,w,m,y,d,c,m> - Define how recency scores decay with time. z w, m, y, d, c, m are floats in the range 0 - 1, which specify the recency score assigned to documents, 0 days, 1 wk, 1 mth, 1 yr, 1 dec, 1 cen, 1 mill. old. (dflt 1.0,0.75,0.5,0.25,0.025,0.0025) Recency scores between key values linearly interpolated. Past the millennium, recency scores are 1/daysold.

-reference_date=<string>

If specified, recency is based on this date rather than that of most recent doc. Format is <yyyymmdd>, or 'today'.

-remove_urls=<string>

Prevent the specified URLs from appearing in the results for a query. Value is a space separated list of URLs. URLs must correspond to those recorded by padre-iw. (dflt Inactive)

-sco=<string>

<n>[<classes>] Set doc scoring mode to n, using the classes specified. Most common values: 0 - score using doc text only ; 1 - no scoring. Produce an unordered set of results ; 2 - score using anchortext and URLs as well, upweight titles (or whatever fields are configured with -specf). For example to automaticall look in fields 'u' and 'v' for the query terms set -sco=2[u,v]

-scope=<string>

Present only results whose URL satisfies the include/exclude scopes included in list (comma separated). e.g. -scope=anu.edu.au,-anu.edu.au/archives

-sort=<string>

Sort top results by <string>. Possible values: 'date', 'adate' (ascending date), 'title', 'dtitle' (descending title), 'size' (file size), 'dsize' (descending filesize), 'url', 'durl' (descending url), 'coll' (collection name, then score), 'dcoll' (descending collection name, then score), 'meta<f>' (by metadata field f, then score),'dmeta<f>' (descending metadata field d, then score), 'shuffle' (random to avoid bias), 'collapse_count' (to order by the number of collapsed documents, with the largest collapsed set first), 'acollapse_count' (with the largest collapsed set last), 'prox' (for geo search: Sort top results by proximity to origin), 'dprox' (for geo search: Sort top results by descending proximity to origin). 'score_ignoring_tiers' (descending score, ignoring any tiers. Only useful with sortall.) (dflt is case-insensitive for title and meta). '-sort=' turns off sorting.

-sort_sensitive=<boolean>

Use case-sensitive sorting when sorting results by title or metadata strings.

-sortall=<boolean>

Include partial matches in the resorting performed by -sort.

-specf=<string>

Fields listed in string s, as a list of comma separated fields surrounded by square brackets, will be scored specially and added to query when using the -sco=2 mode (dflt '[k,K]').

-sss_defeat_pattern=<string>

URLs matching the specified pattern (currently a simple string match) will not be subject to samesite suppression.

-static_cool_exponent=<float> Range: 0.000000 - 1.000000

Control the extent to which static scores are attenuated with length of query. 0 => no attenuation; 1 => max attenuation. Attenuation by len ** -f.

-unknown_daysold=<integer> Range: 0 - unlimited

A doc with unknown date is assumed to be d days old (for recency calcs) (dflt 366).

-use_Paik=<boolean>

Use the tf.idf scheme proposed by Jiaul Paik at SIGIR 2013 rather than the more conventional BM25 variant.

-use_secds=<boolean>

When working with domain-importance features in ranking, use SECDs if value is on, and raw domain names otherwise.

-vsimple=<string>

Very simple ranking. If set to 'on', equivalent to -sco=0 -cool=off -SSS=0 -kmod=0.

-weight_only_fields=<string>

Documents will not be retrieved in DAAT mode if they only match unfielded query terms in one or more of the implicit fields listed here. For example, specifying '[K,k]' will stop the query 'Monica Lewinski' matching a document solely because of click data or referring anchortext.

-wmeta.<Key>=<key/value pair>

wmeta.C=F Set upweighting factors for metadata class scoring. C - metadata class; F - weight to set. (dflt 0.5 for 'k' and 'K', 1 for everything else).

-xscope=<string>

Present only results whose URL exactly matches the provided URL (after canonicalisation).

K. Ranking - Result diversification options

-SSS=<integer> Range: 0 - 10

Same site suppression depth: 0 - no suppression (dflt for non-web collections.); 2 - hosts and their top level dir’s (dflt for web and meta collections; 10 - special meaning for big Web applications.

-neardup=<float> Range: 0.000000 - 1.000000

Near dupulicates in ranking are multiplied by f. Setting f to 1 turns off near-dup detection.

-repetitiousness_factor=<float> Range: 0.000000 - 1.000000

Penalise a repetitious result by multiplying by the factor specified. (Repetitiousness may involve same-site, same component or repeated metadata.) The penalty stiffens with more repetition. Setting to 1 turns this off.

-same_collection_suppression=<float> Range: 0.000000 - 1.000000

While searching a meta-collection, penalise the second result from the same primary collection as a previous result by multiplying by the factor specified. The penalty stiffens with more repetition. Setting to 0 turns this off.

-same_meta_suppression=<float> Range: 0.000000 - 1.000000

Penalise the second result with the same value for a specified metafield as a previous result by multiplying by the factor specified. The penalty stiffens with more repetition. Setting to 0 turns this off

-title_dup_factor=<float> Range: 0.000000 - 1.000000

The query processor will penalise a result which has exactly the same title as a previous result by multiplying by the factor specified. The penalty stiffens with more repetition. Setting to 1 turns this off.

L. Result collapsing options

-collapsing=<boolean>

Activate collapsing. Collapsing will be based on document content ('$') unless a collapsing_sig value is specified. Note that use of this option will disable result set diversification.

-collapsing_SF=<string>

Metadata fields to include in display for collapsed documents (assuming collapsing_num_ranks is non-zero). (dflt no fields). To view metadata fields 'id' and 'a' set this to '[id,a]'.

-collapsing_label=<string>

Label to indicate why items have been collapsed. (dflt "which are very similar")

-collapsing_num_ranks=<integer> Range: 0 - 1000

Specify how many collapsed results are to be shown under the uncollapsed ones. (dflt 0)

-collapsing_scoped=<boolean>

Scope to only documents which have been collapsed on. Default is off.

-collapsing_sig=<string>

The collapsing_control segment to use when collapsing. E.g. "[a,p]", collapse on author+publisher. The value must correspond to one segment of the indexing.collapse_fields string. (A segment is a comma separated list of fields surrounded by square brackets) (dflt '[$]' (Collapsing on document content.))

M. Security options

-dls_internal_test=<integer> Range: 0 - unlimited

This allows testing of the padre side of the custom document level security mechanism. There is no call out to an external function. The value is interpreted as a combination of bits: 1 bit - dls_internal_test is active/not active; 2 bit - selects whether MINRESULTS mode is used or not. During internal testing, every odd numbered document in the original ranking is arbitrarily treated as inaccessible.

-ipreject=<string> [Not CGI]

<queryLimit>,<windowSeconds>,<upperQueryLimit> - Use an ip rejector to limit requests from a single machine. Allow <queryLimit> queries per <windowsSeconds>, don’t record more than <upperQueryLimit> queries.

-ldLibraryPath=<string> [Not CGI]

Full path to security plugin library

-locking_model=<string> [Not CGI]

Name of locking model, either "trim" or "sharepoint".

-no_security=<boolean> [Not CGI]

Disable DLS, available as a command line option.

-secPlugin=<string> [Not CGI]

Name of security plugin library

-translucent_DLS=<boolean> [Not CGI]

Enables translucent DLS DAAT only.

-userkeys=<string> [Not CGI]

Conduct this search with security keys specified by s. The format is '<collectionName>;key<delim>' where delim is either ',' or new line, spaces are removed for example 'c1;k1 c2;k1,c2;k2'

N. Spelling options

-spelling=<boolean>

Activate spelling suggestion mechanism.

-spelling_alpha=<float> Range: 0.000000 - 1.000000

Set the weighting between 'closeness to the query' and support in the collection for a candidate suggestion. Big alpha, high weight on closeness to the query.

-spelling_blend_thresh=<float> Range: 0.000000 - 1.000000

Confidence threshold for automatically blending results for a query suggestion with those from the user’s original query.

-spelling_difflen_thresh=<integer> Range: 0 - 1000

Don’t make suggestions more than i characters longer or shorter than query.

-spelling_dym_thresh=<float> Range: 0.000000 - 1.000000

Confidence threshold for making a 'Did you mean' suggestion.

-spelling_edist_constant=<float> Range: 0.000000 - 1000.000000

Don’t make suggestions whose edit distance from the query exceeds f + query_length * spelling_edist_proportion

-spelling_edist_proportion=<float> Range: 0.000000 - 1.000000

Don’t make suggestions whose edit distance from the query exceeds spelling_edist_constant + query_length * f (0⇐f⇐1)

-spelling_fullmatch_trigger_const=<float> Range: 0.000000 - unlimited

Don’t look for suggestions if there are at least f * log10(num docs) full matches.

-spelling_fullmatch_trigger_const=<float> Range: 0.000000 - inf

Don’t look for suggestions if there are at least f * log10(num docs) full matches.

-spelling_include_context=<boolean>

Include the non-corrected part of the query in the suggestion link.

-spelling_min_querylen=<integer> Range: 1 - 1000

Suggestions not made for queries shorter than this.

-spelling_wt_thresh=<float> Range: 0.000000 - 100.000000

Don’t make suggestions whose weight is less than this. Weight is complex to explain, sorry.

O. TREC specific options

-trec_runid=<string>

For TREC participation: Each result in TREC format will include this runid.

-trec_topic=<integer> Range: 0 - unlimited

For TREC participation: The first query in a batch will get this topic number. Each new query will increase the number by one.

-trecids=<boolean>

For TREC participation: Each result in TREC format will use the TREC docno rather than a URL

© 2015- Squiz Pty Ltd