Facet counts are wrong / inconsistent

Background

This article discusses how to control the accuracy of faceted navigation counts.

Faceted navigation (and other) counts in Funnelback are estimates based on the number of results found when a search query is executed.

The accuracy of the estimates can vary depending on how large the index is and the number of results that are returned.

The counts can also change when you select a category because the counts are re-estimated when the new query containing the applied facet runs.

Improving count accuracy

Funnelback v12+ uses a default query processing mode known as document at a time (DAAT) where it considers relevant results by sequentially checking each document in the index until a set of matches determined by the DAAT limit is reached. This mode of processing queries is much more efficient when retrieving relevant results, but the tradeoff is that the result set is not fully scanned and that the resulting counts are all estimated. By default the processing will stop when the first 5000 matching items have been found.

The counts may also change when a facet is applied as the counts will be re-estimated based off matching results from the query that runs with the facet applied.

The accuracy of counts can be increased by increasing the DAAT limit so that more of the index is processed before any estimates are made. The tradeoff is that queries will take longer to run as the DAAT limit is increased.

To increase the DAAT limit on a collection you set a query processor option of -daat=N where N is the size (e.g. default is -daat=5000).

Using service_volume=low or term at a time mode

The use of service_volume=low or term at a time mode is not recommended as this switches Funnelback into a legacy mode of processing queries that does not support a number of newer features.

Some older implementations of Funnelback increase the accuracy of the faceted navigation (and other) counts by switching the query processing mode to term at a time.

This is done by setting the query processor option -service_volume=low or -daat=0 both of which switch Funnelback from document at a time mode to term at a time mode.

Term at a time mode does not support many newer features including:

  • Curator

  • Query blending

  • Content auditor

  • Accessibility auditor