gscopes.cfg (configuration file)
gscopes.cfg defines a set of generalized scopes that are applied based on URL patterns.
Generalised scopes can be used in numerous ways to narrow down searches to particular sub-parts of a collection. The
gscopes.cfg file is a standard place to store mappings from gscopes to the URL patterns that the numbers should be applied to.
A text file, with one gscope definition per line.
An alpha-numeric ASCII string no longer than 64 characters. White space and all other punctuation is not permitted.
GSCOPE-IDvalues starting with
Funin any upper or lower case form are reserved for internal use only.
a Perl5 compatible regular expressions that matches against the URL.
GSCOPE-IDvalues can be used more than once with different regular expressions. The resulting gscope within the index will include the documents that match any of the supplied regular expressions.
REGULAR-EXPRESSIONvalues can be used more than once with different
GSCOPE-IDvalues. Any document within the index will be tagged with all matching
GSCOPE-IDvalues specified in the
query-gscopes.cfgare combined with any URL pattern based entries from
gscopes.cfg. The resulting gscope within the index will include all documents that have either has a URL that matches a regular expression defined in
gscopes.cfgor a query defined in
GSCOPE-IDvalues will be skipped when an update runs and the matching rule will be excluded from the index. This will only raise a warning in the
Maps government websites to different gscopes based on state:
act \.act\.gov\.au/ qld \.qld\.gov\.au/ tas \.tas\.gov\.au/ nsw \.nsw\.gov\.au/
Maps the 'documents' section of a website to gscope
documents. Additionally gives '.doc' files in the important subdirectory the gscope
documents www\.company\.com/documents/ importantWordDocuments www\.company\.com/documents/important/.*\.doc
Prefix the regular expression with the (?i) directive to use case-insensitive matching:
This will match URLs containing "Documents", "DOCUMENTS" "DoCuments" etc.