In some applications, it is useful to narrow down a search to particular sub-parts of a collection rather than searching over the entire collection. This can be done in Funnelback by marking a documents as matching or not matching a gscope, it is then possible to restrict a search to documents which match a gscope or a gscope expression.
For example, imagine a company website that had two major sections, a company news section and a careers section. By setting all the documents in the news section to have gscope 'news' set, and all the documents in the careers section to have gscope 'careers' set, you could enable (along with suitable UI customisation) search over only the news section, or only the careers section. This simple use case can be done directly using faceted navigation, gscopes should only be used directly if faceted navigation is not suitable.
The gscopes system is designed to be flexible in order to support a variety of use cases. Documents can be given multiple gscopes. For example, one document could be given the gscopes 'people', 'staff', 'professor' and 'ultimateFrisbeeMember'. Additionally, a search can be restricted to arbitrary boolean combinations of gscopes. For example, you can instruct the search engine to restrict results to those documents that have gscope 'people', OR have both gscopes 'people' AND 'ultimateFrisbeeMember', as long as they do NOT have gscope 'staff'.
To use the gscopes system, you must set up a gscopes definition file. All of which follow a format of:
<gscope name> <pattern or query that must match for the gscope to be set>
The gscope name is a alpha-numeric ASCII string no longer than 64 characters. White space and all other punctuation is not permitted. Additionally gscopes prefixed with
Fun in any upper or lower case form are reserved for internal use only.
The regex gscope definition file can be created from the administration interface by selecting the collection you wish to use gscopes on, selecting the 'Administer' tab, clicking on 'Browse Collection Configuration Files' and then using the drop down box on the configuration files screen to create a file called
Gscopes are automatically applied during the indexing process. You may also specify gscopes options by setting gscopes.options in
Changes to gscopes configuration can be applied to a collection without a reindex by running an advanced update to reapply gscopes to the live view.
Command line usage
URL pattern gscopes can be applied manually be running the following commands:
/opt/funnelback/bin/padre-gs /opt/funnelback/data/web/offline/idx/index /opt/funnelback/conf/web/gscopes.cfg
c:\funnelback\bin\padre-gs.exe c:\funnelback\data\db\offline\idx\index c:\funnelback\conf\db\gscopes.cfg
Changed gscopes are not autmatically applied to all generations in a push collection. Gscopes are applied to newly committed generations as well as merged generations. To re-apply gscopes to all generations you will need to trigger a vacuum via the push API.
Searching with gscopes
To narrow down a search to a particular gscope, the appropriate query processor option must be set. This can either be done via the collection configuration (which will affect every search), or with a CGI parameter directly at search time (which will only affect one search).
To specify the query processor options in the
<gscope expression> is either:
- a single gscope e.g.
- a reverse Polish gscope expression (see below) e.g.
To use the CGI parameter add the following to your request URL:
<gscope expression> is defined in the same way as above.
The gscope expressions used are reverse Polish expressions. This means that all operands to a logical operation (such as AND, OR, NOT) precede the operator itself. This method helps avoid ambiguity and the need for brackets around complex logical expressions. However it can look quite odd to those unfamiliar with it. In Funnelback, '+' is used to represent the AND operation, '|' represents the OR operation and '!' represents the NOT operation. The best way to understand reverse Polish expressions is with some examples:
|staff||Matches documents which have gscope staff set.|
|staff,student+||Matches documents that have BOTH gscopes staff and student set.|
|56,4|||Matches documents that have gscope 56 OR 4 set.|
|3!||Matches documents that do not have gscope 3 set|
|1,2,3,4|||||Matches documents that have ANY of the gscopes 1,2,3,4|
|1,2,3,4+++||Matches documents that have ALL of the gscopes 1,2,3,4|
For more complex expressions than this, it is important to understand that the expression works as a stack. Reading from left to right, operands (gscope) are pushed onto the stack, while operators (e.g. !, +, |) take off one or two numbers from the stack (one for !, two for + or |) to operate on. To help explain this, here are some further examples:
|3,4!+||Matches documents that have gscope 3, but not 4|
|1,2,3,4|++||Matches documents that have gscope 1, 2 and one or both of 3 and 4.|
|12,23+4|7!+||Matches documents that have gscope '4', OR have both gscopes '12' AND '23', as long as they do NOT have gscope '7'.|