Funnelback logo

Documentation

TRIM collection

Introduction

TRIM is an electronic document/records management system. Funnelback can index the documents and records held by TRIM. Records from the TRIM database are regularly extracted and the contents of electronic documents are filtered and indexed. Results displayed in a web browser can then be linked back into the TRIM client software or optionally linked directly to the electronic documents themselves (See the section on extra collection options below). The TRIM adapter is only available when Funnelback is installed on a Microsoft Windows operating system.

As of v13, the preferred way to index a TRIM repository is to use a TRIMPush collection.

Important Note: If you wish to enforce Document Level Security over a TRIM collection then Funnelback must be deployed on top of the Microsoft IIS web server. See Configuring IIS for Funnelback for more details.

System setup

In order to "crawl" the TRIM database, the Funnelback crawler needs to be able to access TRIM. This involves:

  • Installing the TRIM client and SDK onto the Funnelback server.
  • Creating a TRIM user to query the database.

To install the TRIM client and SDK, see your relevant TRIM documentation (Note: Starting from TRIM 7.1, the SDK is automatically installed, there is no need to check a specific option in the installer).

TRIM users are normal Windows domain users that have been given specific access rights within TRIM. The Funnelback TRIM adapter must be able to login to TRIM to gather its content. For this purpose it is highly recommended that a specific Windows user be created to do this. This way, that user can be given the exact access rights that it needs within TRIM to gather the required content.

A Funnelback TRIM user will typically not need update permissions. (If you wish to gather all content from TRIM, then giving the Funnelback TRIM user administrator privileges will increase the speed of the gathering process). Your TRIM administrator should be able to assist with this. It is important to test that this user can login to TRIM from the Funnelback server -- use the TRIM client to verify that this works.

You also need to ensure that Domain Users are given access to the TRIM SDK's temporary folders on the Funnelback server. These folders are named ServerData and ServerLocalData and usually found under C:\Program Files (x86)\Hewlett-Packard\HP TRIM.

64bit systems

The TRIM SDK isn't available in 64bit version of Trim. Even though Funnelback requires a 64bit OS such as Windows 2008, you must install the 32bit version of the TRIM client. Additionally, Funnelback ships with a 64bit Perl since version 13. To be able to use TRIM collections with Funnelback 13+, you need to separately install a 32bit version of Perl.

TRIM Working folder

Since the TRIM client by default caches any documents it obtains from the TRIM server on the local machine, you must also switch off document caching in the TRIM client. Failing to do so will result in using twice the required amount of data space on your Funnelback server. To switch off caching, start your TRIM client as the Funnelback user, then navigate to:

  • Tools → options → User configuration → store caching (TRIM version less than 6.2)
  • Tools → options → store caching (TRIM version 6.2 and greater)

and clear the checkbox.

Moreover you can set an alternate temporary directory than the default one, which is usually located on the C: drive inside the TRIM program folder. To do so, add a new registry String value named WebServerWorkPath in the following registry key:

  • HKLM\SOFTWARE\Wow6432Node\TOWER Software\TRIM5\

Set the value to the full path to the temporary directory, for example D:\Data\TRIM\WebServerWorkPath.

Updating TRIM Collections

Updates of TRIM collections must be run as a valid TRIM user. The usual Funnelback administration account (which is created when installing Funnelback) is normally not sufficient for this. Therefore, updates of TRIM collections must be run as a scheduled update (with appropriate user credentials set).

For initial testing and setup the update can also be run manually from a Windows command terminal when logged in as the appropriate TRIM user. See the scheduled update page for sample command syntax.

The initial update needs to run in a special mode in order to gather every existing record in one pass. It can then be changed to the default mode to collect only changed records since the last update (See Initial gather mode below). The initial gather mode will automatically be disabled once a first gather process complete, be sure to re-enable it if you need to perform a full update from scratch again.

Creating a TRIM collection

AdminUI81CreateTrim.PNG

The screenshot above shows the TRIM "create collection" page in the Funnelback Administration interface. The key fields to note are:

Database ID
is the two-alphanumeric ID for the TRIM database.
TRIM Workgroup Server
The name of your TRIM server. (Only necessary for TRIM v6.2 and later)
TRIM Workgroup Server Port
The TCP port your TRIM server operates on. (Only necessary for TRIM v6.2 and later)
Gather documents beginning from
documents registered or modified since this date will be gathered.
Stop gathering at
Stop the gathering when a specific date is reached. If blank the gather script will gather everything up to the most recent record.
Document types
are the file types you want extracted from the database. Note that records without electronic documents can still be gathered as well. They will be converted to a simple HTML format.
Request delay
is the time, in milliseconds, between requests for records.
Initial gather mode
Enable this option for the very first gather in order to select records based on their creation date instead of their updated date. This mode ensures that all existing records will be collected once before switching to day-to-day updates. After the initial update this setting will automatically be disabled.

Crawl date

The adapter will gather documents that have been registered or modified since a given date. This date will initially be January 1st 1970, in order to capture all documents in the TRIM database. It will be successively updated with each crawl of your TRIM collection to make sure the adapter only gathers the newest information.

The adapter can also stops at a specific date. While this is usually not needed and can be left blank in regular updates, it can be used during the initial gather to target a specific time frame.

Document types

A TRIM record will be extracted as a simple HTML page displaying the TRIM metadata values in a table. Electronic documents associated with a record will also have their contents extracted and the metadata table mentioned above will be attached to the end of the document contents.

Email messages (VMBX format only) are split into separate portions: the email body and any email attachments. If you wish to index the content of the older TRIM email format (MBX) then you must first upgrade the records to be in VMBX format. TRIM provides tools to do this.

Example HTML table

TitleGBRMPA - Preparation of Financial Estimates of Expenditure 1982/83
Access ControlModify Record Access: Access denied; Destroy Record: Access denied
BarcodeR45000000V
Creator LocationPeter Abbott
Date the Record was Created1982-02-07 00:00
Date Registered1994-05-30 00:00

Extra collection options

There are some additional settings that affect the TRIM adapter. These can be accessed by selecting the TRIM collection in the Administration Interface, going to the "Administer" tab and clicking on the "Edit Collection Settings" link. You can then click on the "TRIM extras" tab, which will display the following:

AdminUI91TrimExtras.PNG

Default live links
The live links provided by Funnelback can be set to to give the user copies of the documents in TRIM (Documents) or a 'TRIM reference' capable of launching the TRIM client and taking the user to the record in question (References). Use the 'References' setting if all of your searchers will have the TRIM client installed on their machines. You must use the 'References' setting if you wish to display results based on records without electronic documents.
TRIM license number
The TRIM license number is required when references are used as live links as it will be used to generate valid references. This number can be found using the TRIM client (Help->About TRIM, System info., Software License->License number) as shown below. The leading zeros can be omitted.

TRIM6LicenseNumber.png

Document limit
sets a limit on how many document are gathered (useful for initial testing).
Verbosity
increases the amount of information logged
Number of sub-folders
the adapter creates a number of sub-folders to store record data and tries to divide the data evenly among them.
Web server work path cleanup interval
TRIM uses a temporary folder to store copies of documents during the gathering phase (See the TRIM documentation on the WebServerWorkPath parameter). This folder can grow large during gathering depending of the size of your TRIM database and this settings can help to prevent that. Please specify here an interval (In number of records gathered) at which the temporary folder will be cleaned up during gathering (ie. if you set 5, the directory will be cleaned up every 5 gathered records).

Slices

If your TRIM server comes under heavy load, you may wish to avoid usage problems by setting the adapter to pause occasionally during the gathering process. You can specify that the adapter should pause for S seconds every N records.

records per slice
is the number of records to process before sleeping (N).
sleep
the number of seconds to sleep (S).

The gather start date will be updated at the end of each slice. This allows you to restart the update from the last successful slice if the gather fails for any reason. To do so, please use the -restart-gather flag on the update script.

It's advised to use the slice mode for the initial gather phase as it can goes for several days and can break if the TRIM server goes down for any reason (Backups, scheduled maintenance, etc.).

TRIM record metadata

There are a large number of metadata fields in TRIM that can be used with Funnelback. There is an additional administration page that allows you to identify that a TRIM metadata field is:

  • extracted as part of the HTML table; and/or
  • used for metadata.

This page can be accessed from the administration home page via "Administer" Tab -> "Edit Collection Settings" -> "TRIM Records", which will take you to the following form:

AdminUI7TrimRecords.PNG

Example Metadata

If a TRIM metadata field is selected for use as Funnelback metadata, then the HTML documents generated from each record will contain <meta> tags similar to the following:

<meta name="trim.authorloc" content="ACT Revenue" />
<meta name="trim.datereg" content="1999-08-05 00:00" />
<meta name="trim.number" content="G99/806" />

metamap.cfg

The standard Funnelback metamap.cfg can then be used to map the TRIM metadata to Funnelback metadata classes.

For example, the following lines map the TRIM author and registered date to the Funnelback author (a) and document date (d).

a,1,trim.authorloc
d,1,trim.datereg

See also

top ⇑