Plugin: SFTP gatherer
Purpose
Use this plugin when you need to download and index content from an SFTP server.
|
Usage
Enable the plugin on data source
Enable the SFTP gatherer plugin on your data source from the plugins screen in the search dashboard or add the following data source configuration to enable the plugin.
plugin.sftp-gatherer.enabled = true plugin.sftp-gatherer.version = 1.0.0
| The plugin will take effect after the configuration is published, and a full update of the data source has completed. |
Plugin configuration settings
The following options can be set in the data source configuration to configure the plugin:
-
plugin.sftp-gatherer.config.hostname: (string) Host name or IP of the SFTP server. e.g. sftp.example.com -
plugin.sftp-gatherer.config.username: (string) User name used to access SFTP server. -
plugin.sftp-gatherer.encrypted.password: Password used to access SFTP server. -
plugin.sftp-gatherer.config.port: (integer) SFTP server port. e.g. 5002Default value is
22 -
plugin.sftp-gatherer.config.file: (string) File name to download from SFTP server. Path can be included. Multiple files can be downloaded by specifying additional keys with a suffix (see example below). -
plugin.sftp-gatherer.config.mime-type: (string) MIME type that will be set for the downloaded files. The plugin only accepts a single MIME type which will be applied to all downloaded files. Common MIME types aretext/xmlfor XML,application/jsonfor JSON, andtext/csvfor CSV. -
plugin.sftp-gatherer.config.store-url: (string) Prefix that is attached to the file name, which defines the URL that will be used to store documents.Default value is
<data source name>/datafile -
plugin.sftp-gatherer.config.timeout: (integer) The maximum time to wait for the connection to be established, in seconds. If 0, wait as long as needed (but at most 50 seconds).Default value is
0 -
plugin.sftp-gatherer.config.max-file-size: (integer) Maximum accepted file size to download (in MB).Default value is
5
Examples
Example: Download a single file
The configuration below will configure the custom data source to download a single document myfiles\myXMLFileToIndex.xml from the SFTP host my.ftp.server.com:5002.
The file will be indexed as an XML file and have the URL: <data source name>/datafile (where <data source name> is replaced with the ID of your custom data source).
plugin.sftp-gatherer.config.hostname=my.ftp.server.com
plugin.sftp-gatherer.config.port=5022
plugin.sftp-gatherer.config.username=myUserName
plugin.sftp-gatherer.encrypted.password=mySecretPassword (1)
plugin.sftp-gatherer.config.file=myfiles\myXMLFileToIndex.xml
plugin.sftp-gatherer.config.mime-type=text/xml (2)
| 1 | To add your encrypted password via the administration dashboard configuration editor select the add new button and create a plugin.*.encrypted.* key. The additional fields should be set as follows: Plugin ID = sftp-gatherer and Secret Key = your password. |
| 2 | This plugin supports the client side of the sftp protocol in version 3. As a result all files are downloaded as binary files. The MIME type of the downloaded files must be provided in your configuration. All documents downloaded in this configuration will be stored as XML documents. |
Example: Download multiple files
In this example multiple files are configured to be downloaded.
plugin.sftp-gatherer.config.file=myfiles/myXMLFileToIndex.xml (1)
plugin.sftp-gatherer.config.file.2=myfiles/myXMLFileToIndex2.xml (1)
plugin.sftp-gatherer.config.file.3=myfiles/otherXMLfileToIndex.xml (1)
plugin.sftp-gatherer.config.stored-url=https://example.com (2)
plugin.sftp-gatherer.config.timeout=10 (3)
| 1 | Every file that you wish to download must be configured with a separate plugin.sftp-gatherer.config.file key. The plugin will not download all files contained within a directory. |
| 2 | Funnelback requires all indexed documents to have a URL. If the document will be split later (for example, a JSON file converted to XML then split along an X-Path), each record would get its own URL later and thus this URL of the overall document doesn’t matter. If not supplied, a dummy URL will be used <data source name>/<datafile name>. This configuration will prefix all the stored documents with https://example.com resulting in the documents being stored with the following URLs:
|
| 3 | The server connection timeout can be set with following configuration key (in seconds). This sets the timeout to 10s. |