Manually building result collapsing
Background
This article details how to generate result collapsing indexes manually on the command line.
Process
The result collapsing index files are built by the padre-cc
program. The documentation says you need to rebuild the index but you can skip this and build the collapsing manually. You may wish to do this if you have a large index that takes a long time to build.
Usage for padre-cc
can be seen by running the program without arguments:
$ /opt/funnelback/bin/padre-cc
Purpose: To build an index.collapsig file to permit use of collapsed rankings.
Usage: /opt/funnelback/bin/padre-cc <index_stem> [-collapse_control=<string>] [-debug=on]
Utility for building a .collapsig file of collapsing
signatures. If no control_string is given, a one-column
file is built using the signatures from the .textsig file.
The collapse_control string must consist of sequences of
metadata field characters (ASCII letters or digits) separated by
commas. The characters $ and # may be used as metadata field
characters and represent document summarisable text and
document URL respectively. In future, it is planned to allow
field characters to be followed by a regular expression,
indicating that only the part of the metadata string which matches
the regex should be used in calculating the signature.
Example current control string: '$,ta'. In this case the .collapsig
will have two signatures per document: Column 0 is the normal document
signature and column 1 is a signature derived from the concatenation of
metadata fields t and a, in that order.
You can view the collapsing command that was run automatically during a collection’s update by looking at the collection’s update log.
In the Index section look for the line Index: COLLAPSIG
- this will the show the command that was used to build collapsing for a collection.
If you wish to manually build collapsing on an existing (live) index you can run the command manually:
$ $SEARCH_HOME/bin/padre-cc <INDEX-STEM> -collapse_control=<INDEXING-COLLAPSE-FIELDS>
-
<INDEX-STEM>
is the index stem that you wish to build collapsing for (eg./opt/funnelback/data/COLLECTION-ID/live/idx/index
. -
<INDEXING-COLLAPSE-FIELDS>
is the list pf fields you wish to collapse on (this is the value that is normally read from theindexing.collapse_fields
collection.cfg
setting.
This should create an index.collapsig
file in the index folder for the collection.