Funnelback logo

Documentation

File-copy collections

Introduction

Fb-update-file-copy.png

A filecopy collection is used for indexing documents from a file share or a local disk. It is made from a copy of the documents from a local or remote filesystem directory/folder. If you wish to index text-only content (no binaries such as .DOC, .PDF) then you can use a local collection as well.

An update will copy new or changed files from the source folder into the collection's offline data directory from where the update will proceed as normal. Binary documents are converted into text, text content is indexed, and the offline view is swapped with the live view.


Supported Directories

Funnelback supports the indexing of various different types of directory. These include:

Local Directories
These are located on the search server and are addressed as local paths.
Windows file shares
These are file shares that are served using the SMB or CIFS protocols, as is standard in most Windows environments. They can be addressed as UNC paths.
Netware file shares
These are file shares that are served using Novell Netware protocols and are addressed as a local path on an unused drive. Please note that filecopy.novell.mount_point and filecopy.novell.server must be manually set for Netware file shares, and that the "Novell Client for Windows" (which supports only 32bit Windows systems) must be installed on the Funnelback server.

How the data source is specified will depend on where the data is located. For example, a file-copy collection might have

For a local disk:

filecopy.source=/var/documents/shared/ 
For a windows fileshare:

filecopy.source=\\fileserver\documents\
-or-
filecopy.source=smb://fileserver/documents/
For a Netware file share:

filecopy.source=S:\documents\

Note that on linux operating systems, the default firewall rules may need to be altered to allow for SMB / CIFS name resolution.

RedHat Linux provides instructions for mounting NFS file shares and also comes with SMB support http://www.samba.org. File shares mounted on a Windows machine can be indexed in a similar way, and will provide SMB support. Please note that drive letter mappings are done or a per-user basis, so paths must be specified as UNC paths (e.g. \\afileserver\directory) for remote file shares. Also note that local collections can not operate with UNC paths or URLs as their data root.

Filecopy options

Option Description
filecopy.cache Enable/disable using the live view as a cache directory where pre-filtered text content can be copied from.
filecopy.domain The domain for the user that will be used to access the source directory.
filecopy.exclude_pattern A regular expression pattern to specify which files NOT to index.
filecopy.filetypes The list of filetypes that will be indexed by a filecopy collection. (Leave blank to specify all files)
filecopy.include_pattern If specified, a regular expression pattern to select files to index.
filecopy.inline_filtering Enable/disable inline filtering of files during gatehring.
filecopy.max_files_stored Upper limit on the number of documents contained in the filecopy collection
filecopy.novell.mount_point The volume path on the Netware server on which the Netware fileshare is mounted.
filecopy.novell.server The name of the Netware server on which the Netware fileshare is mounted.
filecopy.request_delay Artificial delay that can be introduced between copying individual files. May reduce load on the file server.
filecopy.passwd The password to be used when accessing the source directory.
filecopy.security_model Sets the plugin to use to collect security information on files (Early binding Document Level Security).
filecopy.source This is the file system path or URI to the source of the data files.
filecopy.source_list File that contains a specific list of files to copy, rather than using a source directory.
filecopy.store_class The local data cache storage class.
filecopy.user The username to use when accessing the source directory.

See also

top ⇑