Manually building result collapsing

Background

This article details how to generate result collapsing indexes manually on the command line.

Process

The result collapsing index files are built by the padre-cc program. The documentation says you need to rebuild the index but you can skip this and build the collapsing manually. You may wish to do this if you have a large index that takes a long time to build.

Usage for padre-cc can be seen by running the program without arguments:

$ /opt/funnelback/bin/padre-cc
Purpose: To build an index.collapsig file to permit use of collapsed rankings.
Usage: /opt/funnelback/bin/padre-cc <index_stem> [-collapse_control=<string>] [-debug=on]
   Utility for building a .collapsig file of collapsing
   signatures.  If no control_string is given, a one-column
   file is built using the signatures from the .textsig file.
   The collapse_control string must consist of sequences of
   metadata field characters (ASCII letters or digits) separated by
   commas.  The characters $ and # may be used as metadata field
   characters and represent document summarisable text and
   document URL respectively.  In future, it is planned to allow
   field characters to be followed by a regular expression,
   indicating that only the part of the metadata string which matches
   the regex should be used in calculating the signature.
   Example current control string: '$,ta'.  In this case the .collapsig
   will have two signatures per document: Column 0 is the normal document
   signature and column 1 is a signature derived from the concatenation of
   metadata fields t and a, in that order.

You can view the collapsing command that was run automatically during a collection’s update by looking at the collection’s update log.

In the Index section look for the line Index: COLLAPSIG - this will the show the command that was used to build collapsing for a collection.

If you wish to manually build collapsing on an existing (live) index you can run the command manually:

$ $SEARCH_HOME/bin/padre-cc <INDEX-STEM> -collapse_control=<INDEXING-COLLAPSE-FIELDS>
  • <INDEX-STEM> is the index stem that you wish to build collapsing for (eg. /opt/funnelback/data/COLLECTION-ID/live/idx/index.

  • <INDEXING-COLLAPSE-FIELDS> is the list pf fields you wish to collapse on (this is the value that is normally read from the indexing.collapse_fields collection.cfg setting.

This should create an index.collapsig file in the index folder for the collection.