Funnelback logo

Documentation

Introduction to Funnelback Collections

A collection is a set of data that has been gathered from a data source, indexed and made available for searching. Collection types are based on their data sources:

web
A web site or set of sites. The data is gathered using HTTP, or HTTPS, and plain text is extracted from binary document types such as MS-Word and PDF.
local
A local file-system. The data is not gathered, but indexed in place.
filecopy
A file-system. The data is gathered by copying files and the text is extracted from binary documents.
database
A database. The data is gathered using a JDBC driver to the database and selecting one or more tables. The data is stored locally as XML.
directory
A directory (generally of people). The data is gathered using a JNDI driver to access and ActiveDirectory or LDAP directory. The data is stored locally as XML.
trim
A snapshot of a TRIM database.
connector
A collection using custom application connectors.
push
A collection where data is 'pushed' into the index through an API rather than being gathered by Funnelback.
meta
This is a grouping of one or more collections to provide querying over all data in the collections.

Populating a collection

Fb-update-steps.png

A collection is populated in the following order:

  1. The data is gathered. For example, if it is a web collection the web sites will be crawled to download all HTML files and other documents.
  2. All "binary" documents are filtered to extract plain text. For example, PDF files will be processed to extract the text.
  3. The documents will be indexed: word lists and other information will be processed into Funnelback indexes. The index is then used to answer user queries.

All of this work occurs in an offline area to prevent disrupting the current live view which is being used for query processing. If the update process completed successfully, the live and offline views will be swapped, making the new indexes available for querying.


Manage Collections

For details on how to manage Funnelback collections, see the following:

top ⇑