Manually building result collapsing

Table of Contents

Background

This article details how to generate result collapsing indexes manually on the command line.

Process

The result collapsing index files are built by the padre-cc program. The documentation says you need to rebuild the index but you can skip this and build the collapsing manually. You may wish to do this if you have a large index that takes a long time to build.

Usage for padre-cc can be seen by running the program without arguments:

$ /opt/funnelback/bin/padre-cc
Purpose: To build an index.collapsig file to permit use of collapsed rankings.
Usage: /opt/funnelback/bin/padre-cc <index_stem> [-collapse_control=<string>] [-debug=on]
   Utility for building a .collapsig file of collapsing
   signatures.  If no control_string is given, a one-column
   file is built using the signatures from the .textsig file.
   The collapse_control string must consist of sequences of
   metadata field characters (ASCII letters or digits) separated by
   commas.  The characters $ and # may be used as metadata field
   characters and represent document summarisable text and
   document URL respectively.  In future, it is planned to allow
   field characters to be followed by a regular expression,
   indicating that only the part of the metadata string which matches
   the regex should be used in calculating the signature.
   Example current control string: '$,ta'.  In this case the .collapsig
   will have two signatures per document: Column 0 is the normal document
   signature and column 1 is a signature derived from the concatenation of
   metadata fields t and a, in that order.

You can view the collapsing command that was run automatically during a collection’s update by looking at the collection’s update log.

In the Index section look for the line Index: COLLAPSIG - this will the show the command that was used to build collapsing for a collection.

If you wish to manually build collapsing on an existing (live) index you can run the command manually:

$ $SEARCH_HOME/bin/padre-cc <INDEX-STEM> -collapse_control=<INDEXING-COLLAPSE-FIELDS>
  • <INDEX-STEM> is the index stem that you wish to build collapsing for (eg. /opt/funnelback/data/COLLECTION-ID/live/idx/index.

  • <INDEXING-COLLAPSE-FIELDS> is the list pf fields you wish to collapse on (this is the value that is normally read from the indexing.collapse_fields collection.cfg setting.

This should create an index.collapsig file in the index folder for the collection.