Plugin: SFTP gatherer

Purpose

Use this plugin when you need to download and index content from an SFTP server.

  • This plugin must be used in conjunction with a custom data source, and must be enabled this data source.

  • In the Squiz DXP this plugin uses the virus scanning plugin to scan all files before processing. If any of file is considered unsecure, it will be skipped.

  • This plugin downloads everything as binary and sets a single MIME type for all downloaded files. At least one file must be provided. Downloaded documents are then handled in the same way as any other downloaded document and filtered/indexed accordingly.

  • This plugin uses the Java secure channel (JSch) library for ssh operations.

Usage

Enable the plugin

  1. Select Plugins from the side navigation pane and click on the SFTP gatherer tile.

  2. From the Location section, select the data source to which you would like to enable this plugin from the Select a data source select list.

The plugin will take effect after setup steps and an advanced > full update of the data source has completed.

Configuration settings

The configuration settings section is where you do most of the configuration for your plugin. The settings enable you to control how the plugin behaves.

The configuration key names below are only used if you are configuring this plugin manually. The configuration keys are set in the data source configuration to configure the plugin. When setting the keys manually you need to type in (or copy and paste) the key name and value.

SFTP server host

Configuration key

plugin.sftp-gatherer.config.hostname

Data type

string

Required

This setting is required

Host name or IP address of the SFTP server. e.g. sftp.example.com or 201.75.44.21

SFTP user name

Configuration key

plugin.sftp-gatherer.config.username

Data type

string

Required

This setting is required

User name used to access SFTP server.

SFTP password

Configuration key

plugin.sftp-gatherer.encrypted.password

Data type

Encrypted string

Required

This setting is required

Password used to access SFTP server.

SFTP server port

Configuration key

plugin.sftp-gatherer.config.port

Data type

integer

Default value

22

Required

This setting is required

SFTP server port. e.g. 5002

File name

Configuration key

plugin.sftp-gatherer.config.file.*

Data type

string

Required

This setting is required

File to download from the SFTP server (including path). Set 'Parameter 1' to a unique value for the file (e.g. '1', '2')

Multiple files can be downloaded by defining adding additional URLs, however each key only defines a single file to download and must have a unique parameter 1 value set.

File MIME type

Configuration key

plugin.sftp-gatherer.config.mime-type

Data type

string

Required

This setting is required

MIME type that will be set for all files downloaded by the plugin.

The plugin only accepts a single MIME type which will be applied to all downloaded files.

Common MIME types are text/xml for XML, application/json for JSON, and text/csv for CSV.

Store URL

Configuration key

plugin.sftp-gatherer.config.store-url

Data type

string

Default value

<data source name>/datafile

Required

This setting is optional

Prefix that is attached to the file name, which defines the URL that will be used to store documents.

SFTP server connection timeout

Configuration key

plugin.sftp-gatherer.config.timeout

Data type

integer

Default value

0

Required

This setting is optional

The maximum time to wait for the connection to be established, in seconds. If 0, wait as long as needed (max 50 seconds).

Maximum download file size

Configuration key

plugin.sftp-gatherer.config.max-file-size

Data type

integer

Default value

5

Required

This setting is optional

Maximum accepted file size to download (in MB).

Examples

Example: Download a single file

The configuration below will configure the custom data source to download a single document myfiles\myXMLFileToIndex.xml from the SFTP host my.ftp.server.com:5002.

The file will be indexed as an XML file and have the URL: <data source name>/datafile (where <data source name> is replaced with the ID of your custom data source).

Enter the following into the corresponding fields when setting up your plugin:

Field Value

SFTP server host

my.ftp.server.com

SFTP server port

5022

SFTP user name

myUserName

SFTP user password

******

File name

myfiles\myXMLFileToIndex.xml (with Parameter 1 set to the value 1)

File MIME type

text/xml

  1. The password you enter is automatically encrypted when you save the value. If you view the configuration via the results page configuration key editor or raw editor you will see a value like ENCRYPTED:AQX7ZRgj4x0xVpOSA4kWIN9UR2tUFjnI8GMK6FfW6 which corresponds to the value you entered.

  2. When defining the file name(s) you need to enter a unique identifier into the Parameter 1 field. This is required to support downloading of more than one file and can just be a number like 1, 2 etc.

  3. This plugin supports the client side of the sftp protocol in version 3. As a result all files are downloaded as binary files. The MIME type of the downloaded files must be provided in your configuration. All documents downloaded in this configuration will be stored as XML documents.

Example: Download multiple files

In this example multiple files are configured to be downloaded. This example extends the configuration from the above example to download two additional files. It also sets the URL prefix to add when indexing the documents and sets a 10 second timeout for connection to the SFTP server.

Field Parameter 1 Value

File name

2

myfiles\myXMLFileToIndex2.xml

File name

3

myfiles\otherXMLfileToIndex.xml

Store URL

(not applicable)

https://example.com

SFTP server connection timeout

(not applicable)

10

Funnelback requires all indexed documents to have a URL. If the document will be split later (for example, a JSON file converted to XML then split along an X-Path), each record will get its own URL based on the settings you configure when splitting and thus this URL of the overall document doesn’t matter. If not supplied, a dummy URL will be used <data source name>/<datafile name>. This configuration will prefix all the stored documents with https://example.com resulting in the documents being stored with the following URLs:

https://example.com/myfiles/myXMLFileToIndex.xml
https://example.com/myfiles/myXMLFileToIndex2.xml
https://example.com/myfiles/otherXMLfileToIndex.xml

Change log

[1.1.0]

Changed

  • Updated to the latest version plugin framework (Funnelback shared v16.20) to enable integration with the new plugin management dashboard.