vector.indexer_options

Background

This option configures the Funnelback Vector indexer by providing a set of configuration flags which configure various aspects of the Vector build.

Indexer options are supplied as a set of indexer flags which are set in the data source configuration.

Supported flags:

  • --set-paragraph-tag set what HTML tag to use for paragraph extraction: e.g. "p" for <p> tags, "div" for <div> tags, "section" for <section> tags. By default, it’s p.

  • --amalgamate-paragraphs enable amalgamate paragraphs, where 1 = no amalgamation (just 1 paragraph per chunk), 2 or more = amalgamate (2 or more paragraphs per chunk). By default, it’s 1.

  • --max-elements set maximum number of elements (paragraphs) that can be stored in vector DB. By default, it’s 1,000,000.

Setting the key

Set this configuration key in the search package or data source configuration.

Use the configuration key editor to add or edit the vector.indexer_options key, and set the value. This can be set to any valid String value.

Default value

vector.indexer_options=

Examples

Use <div> HTML tags to extract paragraph based on it:

vector.indexer_options=--set-paragraph-tag=div

and use 2 paragraphs per chunk:

vector.indexer_options=--set-paragraph-tag=div --amalgamate-paragraphs=2

and store only 500,000 paragraphs:

vector.indexer_options=--set-paragraph-tag=div --amalgamate-paragraphs=2 --max-elements=500000

Notes

  • Indexing will not occur if the indexer is given an invalid option.