Query processing optimisation

Background

When a query is issued to Funnelback the Funnelback query processor binary (padre-sw) takes the query and uses all the query processor options to run the query against the index. This results in an XML packet being generated which is returned to the UI which transforms this into the data model that underpins the modern UI.

The following techniques can be used to optimise the query processing within Funnelback. Optimising the query will save on processing time and memory usage and is encouraged for every implementation.

Note: the options to optimise on will vary depending on what features are required by the search.

Techniques

1. Reduce the size of the XML returned by padre-sw

This is controlled by the query processor options that are applied to the query. Query processor options are sourced from the following:

  • collection.cfg: query_processor_options configuration value

  • padre_opts.cfg: configuration value

  • faceted_navigation.cfg:`qpopts` value

  • CGI parameters passed in with the query

Try to address the following:

  • Reduce ranking complexity by setting -vsimple=on While not disabling ranking, it does cause a simplified ranking to be applied to the results. This should be set before other options for maximum effect as it changes a number of defaults.

  • If results ranking is not required at all then use -sco=1[UnusedMetaClass]

  • Minimise the value of num_ranks. Most of the time this is just limited to 10 results. However if you’re just interested in result counts (e.g. for faceted nav extra searches) then set num_ranks=1

  • Always use document at a time (DAAT) mode where possible, and set DAAT to the lowest appropriate value (but don’t set -daat=0 as this switched Funnelback to term at a time mode)

  • Set the daat timeout to 0 (-daat_timeout=0)

  • If system generated summaries are not required then disable them. This done by setting the SM value to either -SM=meta (use metadata summaries only) or -SM=off (turns off summaries altogether)

  • If using -SM=meta ensure that the returned summary fields (-SF option) list only those metadata classes that are being used in the templates.

  • Minimise the size of the metadata and summary buffers (-MBL, -SBL values respectively). If not required set these to a small value (-SBL=1 -MBL=1)

  • If best bets are not required disable them (v14.2 and earlier) (-bb=false). In v15 this is disabled by turning off curator (see next item).

  • Turn off curator (curator=off). Note this is only set via a CGI parameter - there is no curator=off query processor option)

  • Limit faceted navigation to count only the required metadata fields. In Funnelback v15.6 and earlier this may involve switching to profile-based facets. If faceted navigation is not required disable the counts in a hook script (e.g. remove countgbits, rmcf, count_urls, count_dates)

  • Limit gscope counts to just those scopes required (e.g. -countgbits=0,1,2 vs -countgbits=all). After v15.8 gscope counting can be disabled with -countgbits=

  • Limit the depth of -count_urls (used by URL facets)

  • Turn off spelling suggestions (-spelling=off)

  • If contextual navigation is unused ensure it it disabled (contextual_navigation.cfg)

  • Set a low value for the contextual navigation timeout (-cnto=0.001)

  • If quicklinks are unused ensure it is disabled (quicklinks.cfg)

  • Disable query blending if unusued (-qsup=off)

  • If result collapsing is unused ensure it is disabled. If used ensure collapsing is only run on the required collapsing signature, minimise the collapsing_num_ranks and only apply collapsing_SF values to those metadata values that are being used in the template.

  • Consider scoping the result set where appropriate

  • If using geospatial search limit the result set using the maxdist parameter.

  • Disable geospatial range calculations if the distance from origin is not used (-geospatial_ranges=false)

  • Turn off the query syntax tree (-show_qsyntax_tree=off)

  • For enterprise search ensure that user key caching is configured

  • Disable extra searches that are not required

  • Ensure hook scripts only run on the appropriate searches (e.g. disable them for extra searches if not required)

  • Disable range calculation (-rmrf=[UnusedMetaClass])

  • Disable explain mode (-explain=false)

2. Optimise the modern UI

Consider doing the following:

  • Minimise the number of extra searches that run as part of any query

  • If extra searches are required always use extra results (per query) in preference to extra search (per result) as the overheads are much smaller.

  • Always use extra results (per query) extra searches in preference to other methods of running extra searches (e.g. via AJAX)

  • If lookups are required as part of results formatting (e.g. postcode lookup) then consider loading this data into the custom data element of the data model instead of performing a query.

  • Minimise the amount of processing required within hook scripts (e.g. title rewrites cans be done via a filter script at crawl time rather than dynamically at query time)

  • If integrating with search.json or search.xml consider removing data model elements that you are not using via hook scripts.

3. Use an XSL transform to display XML records to the end user instead of a custom Freemarker template

Using an XSL transform avoids the overheads from running a padre query as the XSL transform is accessed directly by the cache controller.