Specify how results are determined to be duplicates within content auditor
Can be set in: profile.cfg, collection.cfg
This setting controls how results are determined to be duplicates within content auditor, based on Funnelback’s result collapsing feature.
This default setting causes content auditor to consider two results to be duplicates if they contain the same set of indexed words.
Note that any setting used here must be among the set of collapsing signatures generated at indexing time.
indexing.collapse_fields for details.
To consider the binary document content for duplicate detection purposes, the following setting uses an MD5 hash generated by Funnelback’s filecopier system (note, this hash is mapped into the H class in the default metadata mappings).
To consider documents to be duplicates if both the title and author are the same, the following setting could be used.