Generalized scopes (gscopes)

The generalized scopes mechanism in Funnelback allows an administrator to group sets of documents that match a set of URL patterns (e.g. */publications/*), or all the URLs returned by a specified query (e.g. author:shakespeare).

Once defined these groupings can be used for:

  • Scoped searches (provide a search that only looks within a particular set of documents)

  • Creating additional services (providing a search service with separate templates, analytics and configuration that is limited to a particular set of documents).

  • Faceted navigation categories (Count the number of documents in the result set that match this grouping).

The patterns used to match against the URLs are Perl regular expressions allowing for very complex matching rules to be defined. If you don’t know what a regular expression is don’t worry as simple substring matching will also work.

The specified query can be anything that is definable using the Funnelback query language.

Generalized scopes are a good way of adding some structure to an index that lacks any metadata, by either making use of the URL structure, or by creating groupings based on pre-defined queries.

Metadata should always be used in preference to generalized scopes where possible as gscopes carry a much higher maintenance overhead.

URLs can be grouped into multiple sets by having additional patterns defined within the configuration file.

Configuring gscopes

Generalized scopes are configured by creating a gscopes configuration file from your data source management screen.

To do this:

  1. navigate to the data source’s management screen then select manage data source configuration files from the settings panel.

  2. Create a new gscopes.cfg or query-gscopes.cfg configuration file. gscopes.cfg creates a group of documents on the document’s URLs, query-gscopes.cfg creates a group of documents by running a query and collecting the set of documents returned.

  3. Add the gscope definitions and save the configuration.

  4. Apply the gscopes to your search index.

Gscopes configuration file format

Each line in a gscopes configuration file follows the following format:

<gscope ID> <pattern or query that must match for the gscope to be set>

where:

gscope ID

is an alphanumeric ASCII string no longer than 64 characters. White space and all other punctuation are not permitted. Additionally, gscopes prefixed with Fun in any upper or lower case form are reserved for internal use only.

pattern or query

For gscopes.cfg: a regex pattern which is matched against the document’s URL. For query-gscopes.cfg: a valid search query specified using the Funnelback query language.

Setting a pattern that matches everything else

This setting only applies to URL-based gscopes defined in gscopes.cfg

The data source configuration option gscopes.other_gscope, if set, defines a gscope ID to apply to all other items in the search index that do not match another gscope ID in your gscopes.cfg.

Example

If you set the following in your data source configuration:

gscopes.other_gscope=notspecified

and have the following in your gscopes.cfg:

pdf \.pdf$
word \.docx$
excel \.xlsx$

After you apply these gscopes to your index you will have:

  • All URLs ending in .pdf are assigned the pdf gscope.

  • All URLs ending in .docx are assigned the word gscope.

  • All URLs ending in .xlsx are assigned the excel gscope.

  • Everything else assigned the notspecified gscope.

Setting advanced options

This setting only applies to URL-based gscopes defined in gscopes.cfg

The data source configuration option gscopes.options can be used to apply advanced gscope options.

Available options:

-regex

Sets the pattern type within gscopes.cfg to be a Perl 5 regular expression (default)

-url

Sets the pattern type within gscopes.cfg to be an exact match to a URL (http:// can be omitted for http URLs).

-docnum

Sets the pattern type with gscopes.cfg to be a document number.

-separate

Apply changes to a shadow copy of the index. Can be useful when working with very large indexes. Note: when using this option you may need to set -GSB to ensure enough gscope bits are assigned.

-verbose

Print additional output information when applying the gscopes.

-quiet

Don’t print the before and after summaries.

Example

If you set the following in your data source configuration:

gscopes.options=-separate

Gscope changes will be applied to a shadow copy of the index before being applied.

Applying gscopes to the index

Gscopes are automatically applied during the indexing process of an update for all data source types.

However, gscopes can be applied to an existing index without running a full update of the data source. The process to apply gscopes depends on the type of data source.

Non-push data sources

To manually apply gscopes to an existing index

  1. Locate the data source where the gscopes have been altered.

  2. Run an advanced update and select the option to reapply gscopes to the live view.

Push data sources

Gscopes are not immediately applied to existing indexes in a push data source.

To ensure changed gscopes are applied to existing push indexes:

  1. Log in to the search dashboard and make a note of the ID of the push data source you wish to update.

  2. Select the API UI from the system () menu.

  3. Select the push API and run the /v1/collections/{collection}/vacuum call from the push API collection section, specifying the RE_APPLY_INDEX_EXTRAS option.

Searching with gscopes

To narrow down a search to a particular gscope, an appropriate query processor option must be set. This can either be done via the results page configuration (which will affect every search), or with a CGI parameter directly at search time (which will only affect one search).

To specify the query processor options in the results page configuration use:

-gscope1=<gscope expression>

where <gscope expression> is either:

  • a single gscope e.g. staff

  • a reverse Polish gscope expression (see below) e.g. staff,student|

To use the CGI parameter add the following to your request URL:

&gscope1=<gscope expression>

where <gscope expression> is defined in the same way as above.

Advanced gscope expressions

The gscope expressions used are reverse Polish expressions. This means that all operands to a logical operation (such as AND, OR, NOT) precede the operator itself. This method helps avoid ambiguity and the need for brackets around complex logical expressions. However it can look quite odd to those unfamiliar with it. In Funnelback, '+' is used to represent the AND operation, '|' represents the OR operation and '!' represents the NOT operation. The best way to understand reverse Polish expressions is with some examples:

Expression Description

staff

Matches documents which have gscope staff set.

staff,student+

Matches documents that have BOTH gscopes staff and student set.

56,4|

Matches documents that have gscope 56 OR 4 set.

3!

Matches documents that do not have gscope 3 set

1,2,3,4|||

Matches documents that have ANY of the gscopes 1,2,3,4

1,2,3,4+++

Matches documents that have ALL of the gscopes 1,2,3,4

For more complex expressions than this, it is important to understand that the expression works as a stack. Reading from left to right, operands (gscope) are pushed onto the stack, while operators (e.g. !, +, |) take off one or two numbers from the stack (one for !, two for + or |) to operate on. To help explain this, here are some further examples:

Expression Description

3,4!+

Matches documents that have gscope 3, but not 4

1!,2!+

Matches documents that do not have gscope 1, and do not have gscope 2

1,2,3,4|++

Matches documents that have gscope 1, 2 and one or both of 3 and 4.

12,23+4|7!+

Matches documents that have gscope '4', OR have both gscopes '12' AND '23', as long as they do NOT have gscope '7'.

Examples

To illustrate how the expression is constructed:

Example: Expression: 1!,2!+

  1. Add 1 to the stack

    1
  2. NOT (!) the top value in the stack

    NOT 1
  3. Add 2 to the stack

    2
    NOT 1
  4. NOT (!) the top value of the stack

    NOT 2
    NOT 1
  5. AND (+) the top two values of the stack

    (NOT 1) AND (NOT 2)

Example: Expression: 12,23+4|7!+

  1. Add 12 to the stack

    12
  2. Add 23 to the stack

    23
    12
  3. AND (+) the top two values of the stack

    12 AND 23
  4. Add 4 to the stack

    4
    12 AND 23
  5. OR (|) the top two values of the stack

    (12 AND 23) OR 4
  6. Add 7 to the stack

    7
    (12 AND 23) OR 4
  7. NOT (!) the top value of the stack

    NOT 7
    (12 AND 23) OR 4
  8. AND (+) the top two values of the stack

    ((12 AND 23) OR 4) AND (NOT 7)