Plugin: Vector document storage
Usage
Enable the plugin
-
Select Plugins from the side navigation pane and click on the Vector document storage tile.
-
From the Location section, select the data source to which you would like to enable this plugin from the Select a data source select list.
| The plugin will take effect after setup steps and an advanced > full update of the data source has completed. |
Configuration settings
The configuration settings section is where you do most of the configuration for your plugin. The settings enable you to control how the plugin behaves.
| The configuration key names below are only used if you are configuring this plugin manually. The configuration keys are set in the data source configuration to configure the plugin. When setting the keys manually you need to type in (or copy and paste) the key name and value. |
Document URL format
Configuration key |
|
Data type |
boolean |
Default value |
|
Required |
This setting is optional |
If true, use encoded document URL for S3 key object, in other way base 64 of document URL
Fail on error
Configuration key |
|
Data type |
boolean |
Default value |
|
Required |
This setting is optional |
Defines if the update should fail with an error or just log a warning if document is not successfully sent to storage.
Possible values:
-
true: The update will fail with an error. (default) -
false: a warning will be logged, but the update will continue.
Additional configuration settings
Required Global Configuration
This plugin requires two mandatory configuration keys to be set in the global configuration file /conf/collection.cfg to be inherited by all data sources:
S3 Bucket Configuration
plugin.vector-document-storage.config.bucket-name-
The name of the S3 bucket where documents will be stored for vector chunking processing.
This is a required configuration that must be set globally in
/conf/collection.cfgso data source can access the S3 bucket without individual configuration. plugin.vector-document-storage.config.bucket-region-
The AWS region where the S3 bucket is located (e.g.,
us-east-1,eu-west-1,ap-southeast-2).This is a required configuration for the POC (Proof of Concept) implementation. The bucket region must be explicitly specified as the plugin needs to know the exact region to establish the S3 connection. This setting must be configured globally in
/conf/collection.cfg.
Filter chain configuration
This plugin uses filters which are used to apply transformations to the gathered content.
The filters run in sequence and need be set in an order that makes sense. The plugin supplied filter(s) (as indicated in the listing) should be re-ordered to an appropriate point in the sequence.
| Changes to the filter order affects the way the data source processes gathered documents. See: document filters documentation. |
Filter classes
This plugin supplies a filter that runs in the main document filter chain: com.funnelback.plugin.vectordocumentstorage.VectorDocumentStorageStringFilter
Drag the com.funnelback.plugin.vectordocumentstorage.VectorDocumentStorageStringFilter plugin filter to where you wish it to run in the filter chain sequence.
Examples
S3 key format
Using Base64 encoding (default)
By default, the plugin uses Base64 encoding for S3 object keys. This is the recommended approach for most use cases:
| Configuration key name | Value |
|---|---|
Document URL format |
|
With this configuration, a document URL like https://example.com/page?param=value will be stored in S3 with a Base64-encoded key.
Using URL encoding
For better readability and debugging, you can use URL encoding for S3 object keys:
| Configuration key name | Value |
|---|---|
Document URL format |
|
With this configuration, a document URL like https://example.com/page?param=value will be stored in S3 with a URL-encoded key that is more human-readable.
Error handling
Fail on error (default)
By default, the plugin will fail the entire update process if any document fails to upload to S3:
| Configuration key name | Value |
|---|---|
Fail on error |
|
This is the recommended setting for production environments where data integrity is critical.
Continue on error
For development or when you want to ensure the update continues even if some documents fail to upload:
| Configuration key name | Value |
|---|---|
Fail on error |
|
With this configuration, if a document fails to upload to S3, a warning will be logged but the update process will continue with the remaining documents.