Funnelback logo

Documentation

CATEGORY

Synonyms

Introduction

The synonyms mechanism allows you to specify that certain queries should include additional query terms automatically. This feature can also be used more generally to expand queries into a different form.

Note that Funnelback's Synonyms feature was previously known as query expansion.

Uses for Synonyms

Some possible uses for Funnelback's synonyms feature:

  • Correcting misspellings without requiring the user to make another click. e.g. One may wish that entering the search term 'imigration' may instead perform a search on the term 'immigration'. (Or more likely, both versions of the word)
  • Expanding acronyms or abbreviations. It may be useful to perform a search on the full name or word rather than just a shortened form entered into a search form as a query.
  • Thesaurus use. If a word appearing in a query has some very similar words in the collection being searched, it may be useful to direct the user to documents containing those words as well.

Caution: Use with care. This mechanism is silent, the user may receive little or no notification that their query has been modified, which could be very confusing if used inappropriately.

Note: The use of synonyms can be switched off by using the thesaurus=off CGI parameter when making a request to search.cgi.

Editing Synonyms

To edit synonyms for a collection:

  1. Select the collection you wish to modify in the "Manage Collection" section of the Administration home page
  2. In the "Customise" tab click on the "Customise Synonyms" link.
  3. This will take you to the "Edit Synonyms" form (shown below).

Edit-synonyms.png

The meaning of each form element is as follows:

Query
the original query which triggers the synonym
Expands to
the query after expansion
Type
the type of query matching which should be used (see details on expansion type below)

Testing Synonyms

Once you have created or edited synonyms you should test that they are being displayed correctly when the appropriate trigger query is run on the live search service.

Synonym Types

Funnelback supports three different query matching types for synonyms:

  1. Term by term (default)
  2. Exact query
  3. Regular expression

These are described in more detail below:

Term by term

This is the default query matching type, which acts by expanding any matching term in the original query.

For example if one had decided that any query containing the word 'lawyer' should be expanded into a query that will return results that contain any of the words 'lawyer', 'barrister', or 'solicitor', then one should create a synonym that expands lawyer to [lawyer barrister solicitor]. For example, this means that the query find a lawyer would expand to find a [lawyer barrister solicitor]

Note: Unwanted synonyms can be circumvented by using a query operator to modify the query. For instance, to avoid the query 'lawyer' being expanded in the above example, you could instead use '+lawyer'.

Exact query

This type corresponds to an exact query match i.e. the whole of the input query must match before the expansion will be done.

For example, consider setting imigration to expand to [immigration imigration]. If this were an exact query match type synonym then the query imigration would expand to [immigration imigration], whereas the query imigration australia would remain as imigration australia.

Regular expression

This type acts by expanding any part of the given query that matches a particular regular expression.

For example, the following patterns provide useful expansions:

Query pattern Expands to Meaning Notes
\bq.*\b vision Replace all words beginning with the letter 'q' with the query 'vision'. The \b signifies a word boundary (start or end of a word).
\buranium-\d+\b nuclear Replaces queries like 'uranium-235' and 'uranium-238' with queries for 'nuclear'. The \b signifies a word boundary as above, the \d signifies a numeric character, and the + signifies one or more of the preceding element (in this case, a numeric character).
^query$ search Turn the exact query 'query' into 'search'. The ^ signifies the start of the query and $ signifies the end of the query, so this trigger will match only the exact query "query".
(?i)^query$ search Turn the exact query 'query' (case insensitive) into 'search'. As above, but the match is now case insensitive, so query, Query, QUERY and QuErY would all be converted.
[a-z]: Match any query that contains a single term metadata query, and remove the metadata part of the query. For example, this turns the title query 't:"romeo and juliet"' into the query '"romeo and juliet"'. The square brackets signify one character out of a range of defined characters (in this case a to z, the metadata classes).

A few things to keep in mind when writing regular expressions:

  • Unless you specifically use the ^ (start of string) and $ (end of string) characters, the regular expression could partially match on any part of the query. For example, the \bq.*\b\d example trigger in the table above will match on any query containing a term that starts with a 'q' (i.e. the query doesn't have to consist entirely of a single term starting with a 'q' to match the trigger).
  • Note that the definition of \b (word boundary) does not extend to non-word character query operators such as +, - and ". This means that the trigger \btest\b will not match the queries -test, test# and "test system".
  • The regular expression match is case sensitive by default, but can be made case insensitive by prefixing (?i), which sets the case insensitive flag for the expression.

The regular expressions are compiled using the PCRE library so all triggers must be Perl compatible. Any triggers that fail to compile will display a warning message in the Web server error log. All regular expression synonym entries should be tested in this manner to ensure that they have compiled correctly.

See Also

top ⇑