Funnelback logo

Documentation

CATEGORY

Creating Tuning Data

Introduction

Funnelback provides the ability to automatically test a range of ranking configurations to determine the best settings for a particular collection. To perform this testing, Funnelback requires a list of queries, and the URLs which should be returned for each query. For reliable tuning results, at least 100 distinct queries which are representative of the normal query load of the service should be used.

Creating Tuning Data

From Funnelback's administration home page, once a collection is selected, the 'Tune' tab allows access to the "Edit Tuning Data" page. This page provides a spreadsheet-like interface for creating search tuning data.

Tuning_data.png

As the screenshot above, search queries are listed in the first column of the grid, while the correct URLs which should be returned by the search service are listed in the second column. The remaining columns are used to provide additional information about the search service, and will be discussed further below.

Queries in the query column may be any valid Funnelback query, however in most cases users do not enter complex search operators so they would not commonly used here either. One exception to this are cases where query operators are automatically added through a synonym configuration or similar. Since queries listed within the tuning data are not pre-processed by Funnelback, any synonym expansions, query transforms, facet selections etc. must be applied manually and entered as part of the query here.

Note that:

  • a query can have multiple correct [URL] answers (single entry in tuning data),
  • multiple queries can point to the same URL (one entry in tuning data for each query),
  • multi-term queries are supported e.g. ‘student accommodation’, and,
  • distinct queries: If you’re wanting to tune for variations of a query, you might be better off selecting the most common variation as the query. Existing search analytics reports will often group up queries in ‘Top Queries’ reports based on commonly-clicked URLs, ordered by the frequency of the query. Placing mis-spelled queries or un-stemmed versions of queries in the test data will not resemble the final query as processed by Funnelback (assuming stemming and query blending / spelling suggestions are enabled).

Correct Answers

Correct Answers provided by subject matter experts are expected to match the URLs used internally by Funnelback. They should be entered on individual lines such that the most important result is listed first, and so on. Please note also that protocols (e.g. http:// ) must be included at the beginning of each URL.

An example list of 'correct' answers for a given search query as URLs:

 http://example.com/best-answer.html
 http://example.com/second-best-answer.html

The following example is incorrect, as one of the URLs does not specify a protocol, and the URLs are entered in the incorrect order:

 example.com/second-best-answer.html
 http://example.com/best-answer.html

Scoring Data

The third, 'Score', column displays a score for the current search configuration on each query, with 100% being perfect (the specified URLs, displayed in the correct order at the top of the result set) and 0% being complete failure (none of the specified URLs displayed within the top twenty results). The results in this column can help to identify which queries from the set are performing poorly with a given configuration.

Erroneous URLs

The final, 'Info', column displays and icon if additional information is available for a given row. In the screenshot above, the two provided URLs do not match the include pattern for the collection, so a message is displayed on the right, describing the problem. In this example, the problem is that the specified URLs could never be returned, because they are not permitted in the collection by the include rule.

Success Rate and Search Quality Score

Summary scores above the tuning data table provide an overview of the scores for the current service.

Success Rate
Indicates an average ranking score for all entries in the table
Search quality score
Indicates the failure rate for all entries in the table (i.e. the proportion of search queries for which a Correct Answer was returned in the top 20 results).

Additional Shortcuts

The buttons above and below the table provide several options for for working with the the tuning data. The suggest query button automatically adds a new row to the grid, randomly sampled from the search logs of the collection to ensure it is representative of the collection's query load (this assumes that analytics for the collection have been updated before queries can be automatically suggested).

Shortcuts (by right clicking to display a contextual menu) are also available for each row in the gird, allowing the row to be deleted, or for a new window to be opened showing the current Funnelback results for the selected query.

The buttons below provide the option to export the queries and correct answers to a CSV (spreadsheet) form, and to import from the same format, so that tuning data can be created outside the Funnelback administration interface if required.

After creating a sufficiently large set of tuning data, see the Tuning Search Quality page for instructions on performing search quality tuning.

top ⇑