Plugin: SFTP gatherer

Purpose

Use this plugin when you need to download and index content from an SFTP server.

  • This plugin must be used in conjunction with a custom data source, and must be enabled this data source.

  • In the Squiz DXP this plugin uses the virus scanning plugin to scan all files before processing. If any of file is considered unsecure, it will be skipped.

  • This plugin downloads everything as binary and sets a single MIME type for all downloaded files. At least one file must be provided. Downloaded documents are then handled in the same way as any other downloaded document and filtered/indexed accordingly.

  • This plugin uses the Java secure channel (JSch) library for ssh operations.

Usage

Enable the plugin

  1. Select Plugins from the side navigation pane and click on the SFTP gatherer tile.

  2. From the Location section, select the data source to which you would like to enable this plugin from the Select a data source select list.

The plugin will take effect after setup steps and an advanced > full update of the data source has completed.

Configuration settings

The configuration settings section is where you do most of the configuration for your plugin. The settings enable you to control how the plugin behaves.

The configuration key names below are only used if you are configuring this plugin manually. The configuration keys are set in the data source configuration to configure the plugin. When setting the keys manually you need to type in (or copy and paste) the key name and value.

SFTP server host

Configuration key

plugin.sftp-gatherer.config.hostname

Data type

string

Required

This setting is required

Host name or IP address of the SFTP server. e.g. sftp.example.com or 201.75.44.21

SFTP user name

Configuration key

plugin.sftp-gatherer.config.username

Data type

string

Required

This setting is required

User name used to access SFTP server.

SFTP authentication type

Configuration key

plugin.sftp-gatherer.config.authentication-type

Data type

string

Default value

PASSWORD

Allowed values

PASSWORD,PUBLIC_KEY

Required

This setting is required

Type of authentication used to access SFTP server.

SFTP password / private key passphrase

Configuration key

plugin.sftp-gatherer.encrypted.password

Data type

Encrypted string

Required

This setting is optional

Password used to access SFTP server or the private key passphrase if using public key authentication.

Required if using password authentication. If using public key authentication, this is the passphrase for the private key.

SFTP server port

Configuration key

plugin.sftp-gatherer.config.port

Data type

integer

Default value

22

Required

This setting is required

SFTP server port. e.g. 5002

File name

Configuration key

plugin.sftp-gatherer.config.file.*

Data type

string

Required

This setting is required

File to download from the SFTP server (including path). Set 'Parameter 1' to a unique value for the file (e.g. '1', '2')

Multiple files can be downloaded by defining adding additional URLs, however each key only defines a single file to download and must have a unique parameter 1 value set.

File MIME type

Configuration key

plugin.sftp-gatherer.config.mime-type

Data type

string

Required

This setting is required

MIME type that will be set for all files downloaded by the plugin.

The plugin only accepts a single MIME type which will be applied to all downloaded files.

Common MIME types are text/xml for XML, application/json for JSON, and text/csv for CSV.

Store URL

Configuration key

plugin.sftp-gatherer.config.store-url

Data type

string

Default value

<data source name>/datafile

Required

This setting is optional

Prefix that is attached to the file name, which defines the URL that will be used to store documents.

SFTP server connection timeout

Configuration key

plugin.sftp-gatherer.config.timeout

Data type

integer

Default value

0

Required

This setting is optional

The maximum time to wait for the connection to be established, in seconds. If 0, wait as long as needed (max 50 seconds).

Maximum download file size

Configuration key

plugin.sftp-gatherer.config.max-file-size

Data type

integer

Default value

5

Required

This setting is optional

Maximum accepted file size to download (in MB).

Configuration files

This plugin also uses the following configuration files to provide additional configuration.

id_rsa.pub

Description

Public key used to authenticate with the SFTP server.

Configuration file format

pub

This is the public key for the SFTP server that is used to authenticate the connection. It is used in conjunction with the private key stored in the id_rsa file. The public key is used to verify the identity of the SFTP server during the connection process.

id_rsa

Description

Private key used to authenticate with the SFTP server.

Configuration file format

rsa

This is the private key for the SFTP server that is used to authenticate the connection. It is used in conjunction with the public key stored in the id_rsa.pub file. The private key is used to establish a secure connection to the SFTP server.

Examples

Example: Download a single file

The configuration below will configure the custom data source to download a single document myfiles\myXMLFileToIndex.xml from the SFTP host my.ftp.server.com:5002.

The file will be indexed as an XML file and have the URL: <data source name>/datafile (where <data source name> is replaced with the ID of your custom data source).

Enter the following into the corresponding fields when setting up your plugin:

Field Value

SFTP server host

my.ftp.server.com

SFTP server port

5022

SFTP user name

myUserName

SFTP authentication type

PASSWORD

SFTP password / private key passphrase

******

File name

myfiles\myXMLFileToIndex.xml (with Parameter 1 set to the value 1)

File MIME type

text/xml

  1. The password you enter is automatically encrypted when you save the value. If you view the configuration via the results page configuration key editor or raw editor you will see a value like ENCRYPTED:AQX7ZRgj4x0xVpOSA4kWIN9UR2tUFjnI8GMK6FfW6 which corresponds to the value you entered.

  2. When defining the file name(s) you need to enter a unique identifier into the Parameter 1 field. This is required to support downloading of more than one file and can just be a number like 1, 2 etc.

  3. This plugin supports the client side of the sftp protocol in version 3. As a result all files are downloaded as binary files. The MIME type of the downloaded files must be provided in your configuration. All documents downloaded in this configuration will be stored as XML documents.

Example: Download multiple files

In this example multiple files are configured to be downloaded. This example extends the configuration from the above example to download two additional files. It also sets the URL prefix to add when indexing the documents and sets a 10 second timeout for connection to the SFTP server.

Field Parameter 1 Value

File name

2

myfiles\myXMLFileToIndex2.xml

File name

3

myfiles\otherXMLfileToIndex.xml

Store URL

(not applicable)

https://example.com

SFTP server connection timeout

(not applicable)

10

Funnelback requires all indexed documents to have a URL. If the document will be split later (for example, a JSON file converted to XML then split along an X-Path), each record will get its own URL based on the settings you configure when splitting and thus this URL of the overall document doesn’t matter. If not supplied, a dummy URL will be used <data source name>/<datafile name>. This configuration will prefix all the stored documents with https://example.com resulting in the documents being stored with the following URLs:

https://example.com/myfiles/myXMLFileToIndex.xml
https://example.com/myfiles/myXMLFileToIndex2.xml
https://example.com/myfiles/otherXMLfileToIndex.xml

Example: Use public key authentication to download a single file

This configuration is similar to the previous example, but uses public key authentication instead of password authentication. The private key file must be stored in the conf/plugin-configuration/sftp directory of the data source.

Enter the following into the corresponding fields when setting up your plugin:

Field Value

SFTP server host

my.ftp.server.com

SFTP server port

5022

SFTP user name

myUserName

SFTP authentication type

PUBLIC_KEY

SFTP password / private key passphrase

******

SFTP private key file

Upload the private key file from UI

SFTP public key file

Upload the public key file from UI

File name

myfiles\myXMLFileToIndex.xml (with Parameter 1 set to the value 1)

File MIME type

text/xml

Change log

[1.2.1]

Fixed

  • Correct the file name from id_rda.pub and id_rda to id_rsa.pub and id_rsa.

[1.2.0]

Added

  • Added the SFTP Public Key Authentication feature, allowing users to authenticate SFTP connections using public key authentication instead of passwords. This enhances security and flexibility in managing SFTP connections.

Changed

  • Replaced the JSch library from com.jcraft to com.github.mwiede to support the latest SSH protocol versions and improve compatibility with various SSH servers.

  • Upgrade the JUNIT version to 5.9.3 to ensure compatibility with the latest testing features and improvements.

  • Replaced the test with TestContainer to ensure the tests run in a consistent environment, improving reliability and reducing flakiness.

[1.1.0]

Changed

  • Updated to the latest version plugin framework (Funnelback shared v16.20) to enable integration with the new plugin management dashboard.