gscopes.cfg (configuration file)
gscopes.cfg defines a set of generalized scopes that are applied based on URL patterns.
Background
Generalized scopes can be used in numerous ways to narrow down searches to particular sub-parts of a collection. The gscopes.cfg file is a standard place to store mappings from gscopes to the URL patterns that the numbers should be applied to.
Format
A text file, with one gscope definition per line.
GSCOPE-ID REGULAR-EXPRESSION
- GSCOPE-ID
- 
An alpha-numeric ASCII string no longer than 64 characters. White space and all other punctuation is not permitted. GSCOPE-IDvalues starting withFunin any upper or lower case form are reserved for internal use only.
- REGULAR-EXPRESSION
- 
a Perl5 compatible regular expressions that matches against the URL. 
- 
GSCOPE-IDvalues can be used more than once with different regular expressions. The resulting gscope within the index will include the documents that match any of the supplied regular expressions.
- 
REGULAR-EXPRESSIONvalues can be used more than once with differentGSCOPE-IDvalues. Any document within the index will be tagged with all matchingGSCOPE-IDvalues.
- 
GSCOPE-IDvalues specified in thequery-gscopes.cfgare combined with any URL pattern based entries fromgscopes.cfg. The resulting gscope within the index will include all documents that have either has a URL that matches a regular expression defined ingscopes.cfgor a query defined inquery-gscopes.cfg.
- 
Invalid GSCOPE-IDvalues will be skipped when an update runs and the matching rule will be excluded from the index. This will only raise a warning in theStep-SetGsopes.log.
Examples
Maps government websites to different gscopes based on state:
act \.act\.gov\.au/ qld \.qld\.gov\.au/ tas \.tas\.gov\.au/ nsw \.nsw\.gov\.au/
Maps the 'documents' section of a website to gscope documents. Additionally gives '.doc' files in the important subdirectory the gscope importantWordDocuments:
documents www\.company\.com/documents/ importantWordDocuments www\.company\.com/documents/important/.*\.doc
Prefix the regular expression with the (?i) directive to use case-insensitive matching:
documents (?i)www\.company\.com/documents/
This will match URLs containing "Documents", "DOCUMENTS" "DoCuments" etc.