Funnelback logo

Documentation

CATEGORY

Best bets

Introduction

The best bets mechanism allows you to specify that certain specified URLs should be displayed in the result page whenever a set of trigger words is present in the query.

For example you might wish to say that whenever a query containing the words gene and sequencing is submitted, attention should be drawn to the www.dna.com site.

Notes

  • Funnelback's Best Bets functionality was previously known as 'Featured Pages'.
  • A system called Curator is also available, which offers greater flexibility in configuring common scenarios. However, due to this greater flexibility the configuration of this system is currently substantially more involved than in Best Bets.

Editing Best Bets

To edit best bets for a collection:

  1. Select the collection you wish to modify in the "Manage Collection" section of the Administration home page
  2. In the "Customise" tab click on the "Customise Best Bets" link.
  3. This will take you to the "Edit Best Bets" form (shown below).

Edit-best-bets.png

The meaning of each form element is as follows:

Trigger Query
the words to trigger the display of the best bet information
Type
type of match (see details on match type below)
Result Title
the title of the best bet (used for hyperlink)
Description
some text to be displayed underneath the URL.
Target URL
the URL to be used for the best bet

In the screenshot above you can see that a best bet relating to the trigger query "admissions" has been created. If the query "admissions" is processed then the given best bet will be displayed, directing users to the "Enrolments" website.

Notes:

  • The target URL does not need to be within the domain of the collection being searched i.e. it can be any valid URL.

Testing Best Bets

Once you have created or edited best bets you should test that they are being displayed correctly when the appropriate trigger query is run on the live search service.

Best Bet Match Type

Funnelback supports four different match types for best bets:

  1. Query term by term match (default)
  2. Substring match
  3. Exact query match
  4. Regular expression match

These are described in more detail below:

Query Term by Term Match

This is the default match type, and is equivalent to a boolean OR query. If the trigger query is "search engine" and the input query contains "search" OR "engine" then the best bet will be displayed.

Substring Match

This type looks for a case-insensitive, sub-string match of every term in the trigger against the entire query. This means that if every term in the trigger is found somewhere within the query (whole word or partial word match) in its entirety, then the best bet is displayed.

For example, the trigger sports medicine will match on the following queries:

  • Sports Medicine
  • passports medicinea
  • "sports medicine"
  • medicine sports
  • medicine -sports

Note that trigger words may only contain alpha-numeric characters at the current time. A trigger word containing a hyphen (e.g. unit-level) will not trigger correctly against a query for 'unit-level'. In this case, two separate trigger words (i.e. unit level) may be used to achieve the desired behaviour.

Exact Query Match

The exact matching system is a simple way of specifying an exact case insensitive match for the trigger. In this scheme, the query must mimic the trigger exactly, including the order of the query terms.

For example, the trigger sports medicine will match the following queries if "exact match" is used:

  • sports medicine
  • Sports medicine

The following queries will not match:

  • medicine sports
  • sports
  • sports medicine research
  • "sports medicine"

The exact matching system should generally only be used as an efficient system for matching simple triggers: single term or common phrases.

Regular Expression Match

The regular expression based best bets matching mechanism provides access to the power of regular expressions to enable administrators to specify complex triggers.

Trigger scenarios such as "contains a numeric", "doesn't start with hyphen" (query negation operator) or "not an x: metadata query term" can all be accommodated using this system.

For example, the following patterns provide useful triggers:

Pattern Meaning Explanation
\b\d Match a query containing a term that starts with a numeric. The \b signifies a word boundary and \d signifies a numeric character.
bus\b Match a query containing a term that ends in bus. See above.
\btest\b Match any query containing the exact word test (i.e. not testing or tested). See above.
^search$ Exact match on the query search. The ^ signifies the start of the query and $ signifies the end of the query, so this trigger will match only the query search (except for case insensitivity).
(?i)^search$ Exact (case insensitive) match on the query search. As above, except that the prefix (?i) causes matching to be case insensitive, so this trigger would match the query search, SEARCH, Search or SeArCh.
[a-z]:\S+ Match any query that contains a single term metadata query. The square brackets signify one character out of a range of defined characters (in this case a to z, the metadata classes), the \S signifies a non-whitespace character and the + signifies one or more of the preceding element (in this case, a non-whitespace character).

A few things to keep in mind when writing regular expressions:

  • Unless you specifically use the ^ (start of string) and $ (end of string) characters, the regular expression could partially match on any part of the query. For example, the \b\d example trigger in the table above will match on any query containing a term that starts with a numeric (i.e. the query doesn't have to consist entirely of a single term starting with a numeric to match the trigger).
  • Note that the definition of \b (word boundary) doesn't extend to non-word character query operators such as +, - and ". This means that the trigger \btest\b will match the queries -test, test# and "test system".
  • The regular expression match is case sensitive by default, however if (?i) is prefixed, the matching becomes case insensitive.

The regular expressions are compiled using the PCRE library so all triggers must be Perl compatible. Any triggers that fail to compile will display a warning message in the Web server error log. All regular expression best bet entries should be tested in this manner to ensure that they have compiled correctly.

Configuration file

The best bets mechanism for a collection is controlled by the configuration file: $SEARCH_HOME/conf/collection/best_bets.cfg.

The best_best.cfg file has lines of the following format:

  1. trigger words==URL
  2. trigger words==title==URL
  3. trigger words==title==description==URL

Example

prospero==The tempest==http://mit.edu/Shakespeare/tempest/
sea storm==http://www-tech.mit.edu/Shakespeare/tempest/

The following symbols are used at the start of the trigger query to indicate the different match types:

  • % = Query term by term match
  • + = Exact match
  • ~ = Regular expression match

If none of these symbols are present at the start of the trigger query then this indicates a substring match type.

See also

top ⇑