File system (filecopy) data sources
This feature is not available in the Squiz DXP. |
A file system (filecopy) data source is used for indexing documents from a file system. The file system can be accessed locally or remotely.
An update will copy new or changed files from the source folder into the data source’s offline data directory from where the update will proceed as normal. Binary documents are converted into text, text content is indexed, and the offline view is swapped with the live view.
Create the data source
File system data sources are created by following the data source creation steps and selecting filecopy from the list of data source types.
A file system data source is defined by the following properties:
-
A source directory to copy files from, possibly with an associated domain name, username and password.
Supported directories
Funnelback supports the indexing of various different types of directory. These include:
Windows file shares
These are file shares that are served using the SMB or CIFS protocols, as is standard in most Windows environments. They can be addressed as UNC paths. How the data source is specified will depend on where the data is located. For example, a file system data source might have:
-
For a local disk:
filecopy.source=/var/documents/shared/
-
For a Windows file share:
filecopy.source=\\fileserver\documents\
orfilecopy.source=smb://fileserver/documents/
Note that on Linux operating systems, the default firewall rules may need to be altered to allow for SMB / CIFS name resolution.
RedHat Linux provides instructions for mounting NFS file shares and also comes with SMB/CIFS support
File shares mounted on a Windows machine can be indexed in a similar way, and will provide SMB/CIFS support. Please note that drive letter mappings are done or a per-user basis, so paths must be specified as UNC paths (e.g. \\fileserver\directory
) for remote file shares.
Serving file system results
File system results are served by the user interface layer. It will contact the file system to retrieve the requested file and download it to the search user browser.
Document filtering
Apache Tika is used to convert binary document formats to text. Additional filtering can be applied using Funnelback plugins.
Additional file types (if supported by Tika) can be filtered by adding the types to filecopy.filetypes and filter.tika.types
If you’ve updated the filter chain or how a filter works, you may need to disable the filecopy.cache to ensure the changes are applied to any previously processed documents. |
Filecopier log level
This functionality is only available to Funnelback system administrators. |
The steps below set the log level for a filecopy data source.
-
Copy
$SEARCH_HOME/conf/log4j2.xml.default
to$SEARCH_HOME/conf/<collection>/log4j2.xml
-
Edit the file and update the line below to the desired level.
<Logger name="com.funnelback" level="info"/> <!-- eg. increase to debug level: --> <Logger name="com.funnelback" level="debug"/>
-
Save the file and start and update observing debug messages now appear in the
filecopier.log