gscopes.cfg (configuration file)

gscopes.cfg defines a set of generalized scopes that are applied based on URL patterns.

Table of Contents

Background

Generalised scopes can be used in numerous ways to narrow down searches to particular sub-parts of a collection. The gscopes.cfg file is a standard place to store mappings from gscopes to the URL patterns that the numbers should be applied to.

Format

A text file, with one gscope definition per line.

GSCOPE-ID REGULAR-EXPRESSION
GSCOPE-ID

An alpha-numeric ASCII string no longer than 64 characters. White space and all other punctuation is not permitted. GSCOPE-ID values starting with Fun in any upper or lower case form are reserved for internal use only.

REGULAR-EXPRESSION

a Perl5 compatible regular expressions that matches against the URL.

  • GSCOPE-ID values can be used more than once with different regular expressions. The resulting gscope within the index will include the documents that match any of the supplied regular expressions.

  • REGULAR-EXPRESSION values can be used more than once with different GSCOPE-ID values. Any document within the index will be tagged with all matching GSCOPE-ID values.

  • GSCOPE-ID values specified in the query-gscopes.cfg are combined with any URL pattern based entries from gscopes.cfg. The resulting gscope within the index will include all documents that have either has a URL that matches a regular expression defined in gscopes.cfg or a query defined in query-gscopes.cfg.

  • Invalid GSCOPE-ID values will be skipped when an update runs and the matching rule will be excluded from the index. This will only raise a warning in the Step-SetGsopes.log.

Examples

Maps government websites to different gscopes based on state:

act \.act\.gov\.au/
qld \.qld\.gov\.au/
tas \.tas\.gov\.au/
nsw \.nsw\.gov\.au/

Maps the 'documents' section of a website to gscope documents. Additionally gives '.doc' files in the important subdirectory the gscope importantWordDocuments:

documents www\.company\.com/documents/
importantWordDocuments www\.company\.com/documents/important/.*\.doc

Prefix the regular expression with the (?i) directive to use case-insensitive matching:

documents (?i)www\.company\.com/documents/

This will match URLs containing "Documents", "DOCUMENTS" "DoCuments" etc.

See also