Collection.cfg

Introduction

Name
collection.cfg
Collection Location
~/conf/<collection>/
Collection Defaults
~/conf/
Description
Main configuration file for a collection.

The collection.cfg configuration file is created when a collection is created and may be updated whenever a collection is updated.

Format

The format of the file is a simple name=value pair per line, with the values $SEARCH_HOME and $COLLECTION_NAME automatically expanded to the funnelback installation path and the name of the current collection automatically.

Configuration options

The following tables contain descriptions of the options that are used in the configuration file. Note that some are specific to the collection's type, while others are used for every collection.

Standard Funnelback default values for each configuration option are defined in $SEARCH_HOME/conf/collection.cfg.default and server-wide default values may be configured by adding them to the file at $SEARCH_HOME/conf/collection.cfg

A

Option Description
access_alternate Switch the user to an alternate collection if access_restriction applies.
access_restriction Restricts access by listing allowable hostnames or IP addresses. Only users with matching hostname or IP address can search.
access_restriction.ignored_ip_ranges Defines all IP ranges in the X-Forwarded-For header to be ignored by Funnelback when applying access restrictions.
access_restriction.prefer_x_forwarded_for Determines if access restrictions should be applied to the last IP address in the X-Forwarded-For header.
admin.undeletable If set to "true" this collection can not be deleted from the Administration interface.
admin_email Email address of administrator to whom an email is sent after each collection update.
analytics.data_miner.range_in_days Length of time range (in days) the analytics data miner will go back from the current date when mining query and click log records.
analytics.outlier.day.minimum_average_count Control the minimum number of occurrences of a query required before a day pattern can be detected.
analytics.outlier.day.threshold Control the day pattern detection threshold.
analytics.outlier.exclude_collection Disable query spike detection for a collection
analytics.outlier.exclude_profiles Disable query spike detection for a profile
analytics.outlier.hour.minimum_average_count Control the minimum number of occurrences of a query required before a hour pattern can be detected.
analytics.outlier.hour.threshold Control the hour pattern detection threshold.
analytics.reports.max_day_resolution_daterange Length of time range (in days) to allow in a custom daterange in the query reports UI.
analytics.reports.max_facts_per_dimension_combination Advanced setting: controls the amount of data that is stored by query reports.
analytics.reports.checkpoint_rate Advanced setting: controls the rate at which the query reports system checkpoints data to disk.
analytics.reports.disable_incremental_reporting Disable incremental reports database updates. If set all existing query and click logs will be processed for each reports update.
analytics.scheduled_database_update Control whether reports for the collection are updated on a scheduled basis
annie.index_opts Specify options for the "annie-a" annotation indexing program.

B

Option Description
build_autoc_options Specifies additional configuration options that can be supplied to the query completion builder.

C

Option Description
changeover_percent The new crawl only goes live if the ratio of new vs. old documents gathered is greater than this amount (e.g. 50%).
click_data.archive_dirs The directories that contain archives of click logs to be included in producing indexes.
click_data.num_archived_logs_to_use The number of archived click logs to use from each archive directory
click_data.use_click_data_in_index A boolean value indicating whether or not click information should be included in the index.
click_data.week_limit Optional restriction of click data to a set number of weeks into the past.
click_tracking Enable or disable click tracking.
collection The internal name of a collection.
collection-update.step.StepTechnicalName.run Determines if an update step should be run or not.
collection_group Set group under which collection will appear in selection drop down menu on main Administration page.
collection_root Location of a collection's crawl, index, query logs etc
collection_type Type of collection.
crawler.accept_cookies Cookie policy. Default is false i.e. do not accept cookies. Requires HTTPClient if true.
crawler.accept_files Only crawl files with these extensions. Not normally used - default is to accept all valid content.
crawler.allowed_redirect_pattern Specify a regex to allow crawler redirections that would otherwise by disallowed by the current include/exclude patterns.
crawler.cache.DNSCache_max_size Maximum size of internal DNS cache. Upon reaching this size the cache will drop old elements.
crawler.cache.LRUCache_max_size Maximum size of LRUCache. Upon reaching this size the cache will drop old elements.
crawler.cache.URLCache_max_size Maximum size of URLCache. May be ignored by some cache implementations.
crawler.check_alias_exists Check if aliased URLs exists - if not, revert back to original URL
crawler.checkpoint_to Location of crawler checkpoint files.
crawler.classes.Crawler Java class used by crawler - defines top level behaviour, which protocols are supported etc.
crawler.classes.Frontier Java class used for the frontier (a list of URLs not yet visited)
crawler.classes.Policy Java class used for enforcing the include/exclude policy for URLs
crawler.classes.RevisitPolicy Java class used for enforcing the revisit policy for URLs
crawler.classes.statistics List of statistics classes to use during a crawl in order to generate figures for data reports
crawler.classes.URLStore Java class used to store content on disk e.g. create a mirror of files crawled
crawler.eliminate_duplicates Whether to eliminate duplicate documents while crawling (default is true)
crawler.extract_links_from_javascript Whether to extract links from Javascript while crawling (default is true)
crawler.follow_links_in_comments Whether to follow links in HTML comments while crawling (default is false)
crawler.frontier_num_top_level_dirs Optional setting to specify number of top level directories to store disk based frontier files in
crawler.frontier_use_ip_mapping Whether to map hosts to frontiers based on IP address. (default is false)
crawler.frontier_hosts Lists of hosts running crawlers if performing a distributed web crawl
crawler.frontier_port Port on which DistributedFrontier will listen on
crawler.form_interaction_file Path to optional file which configures interaction with form-based authentication
crawler.form_interaction_in_crawl Specify whether crawler should submit web form login details during crawl rather than in a pre-crawl phase
crawler.header_logging Option to control whether HTTP headers are written out to a separate log file (default is false)
crawler.incremental_logging Option to control whether a list of new and changed URLs should be written to a log file during incremental crawling
crawler.inline_filtering_enabled Option to control whether text extraction from binary files is done "inline" during a web crawl
crawler.link_extraction_group The group in the crawler.link_extraction_regular_expression which should be extracted as the link/URL.
crawler.link_extraction_regular_expression The expression used to extract links from each document. This must be a Perl compatible regular expression.
crawler.logfile The crawler's log path and filename.
crawler.lowercase_iis_urls Whether to lowercase all URLs from IIS web servers (default is false)
crawler.max_dir_depth A URL with more than this many sub directories will be ignored (too deep, probably a crawler trap)
crawler.max_download_size Maximum size of files crawler will download (in MB)
crawler.max_files_per_area Maximum files per "area" e.g. number of files in one directory or generated by one dynamic generator e.g. index.asp?doc=123. This parameter used to be called crawler.max_dir_size
crawler.max_files_per_server Maximum files per server (default is unlimited)
crawler.max_files_stored Maximum number of files to download (default, and less than 1, is unlimited)
crawler.max_individual_frontier_size Maximum size of an individual frontier (unlimited if not defined)
crawler.max_link_distance How far to crawl from the start_url (default is unlimited). e.g. if crawler.max_link_distance = 1, only crawl the links on start_url. NB: Turning this on drops crawler to single-threaded operation.
crawler.max_parse_size Crawler will not parse documents beyond this many megabytes in size
crawler.max_timeout_retries Maximum number of times to retry after a network timeout (default is 0)
crawler.max_url_length A URL with more characters than this will be ignored (too long, probably a crawler trap)
crawler.max_url_repeating_elements A URL with more than this many repeating elements (directories) will be ignored (probably a crawler trap or incorrectly configured web server)
crawler.monitor_authentication_cookie_renewal_interval Optional time interval at which to renew crawl authentication cookies
crawler.monitor_checkpoint_interval Time interval at which to checkpoint (seconds)
crawler.monitor_delay_type Type of delay to use during crawl (dynamic or fixed)
crawler.monitor_halt Checked during a crawl - if set to "true" then crawler will cleanly shutdown
crawler.monitor_preferred_servers_list Optional list of servers to prefer during crawl
crawler.monitor_time_interval Time interval at which to output monitoring information (seconds)
crawler.monitor_url_reject_list Optional parameter listing URLs to reject during a running crawl
crawler.non_html Which non-html file formats to crawl (e.g. pdf, doc, xls etc.)
crawler.num_crawlers Number of crawler threads which simultaneously crawl different hosts
crawler.overall_crawl_timeout Maximum crawl time after which the update continues with indexing and changeover. The units of this parameter depend on the value of the crawler.overall_crawl_units parameter.
crawler.overall_crawl_units The units for the crawler.overall_crawl_timeout parameter. A value of hr indicates hours and min indicates minutes.
crawler.packages.httplib Java library for HTTP/HTTPS support.
crawler.parser.mimeTypes Extract links from these comma-separated or regexp: content-types.
crawler.predirects_enabled Enable crawler predirects. (boolean)
crawler.protocols Crawl URLs via these protocols (comma separated list)
crawler.reject_files Do not crawl files with these extensions
crawler.remove_parameters Optional list of parameters to remove from URLs
crawler.request_delay Milliseconds between HTTP requests (for a particular thread)
crawler.request_header Optional additional header to be inserted in HTTP(S) requests made by the webcrawler.
crawler.request_header_url_prefix Optional URL prefix to be applied when processing the crawler.request_header parameter
crawler.request_timeout Timeout for HTTP page GETs (milliseconds)
crawler.revisit.edit_distance_threshold Threshold for edit distance between two versions of a page when deciding whether it has changed or not
crawler.revisit.num_times_revisit_skipped_threshold Threshold for number of times a page revisit has been skipped when deciding whether to revisit it.
crawler.revisit.num_times_unchanged_threshold Threshold for number of times a page has been unchanged when deciding whether to revisit it.
crawler.robotAgent Matching is case-insensitive over the length of the name in a robots.txt file
crawler.secondary_store_root Location of secondary (previous) store - used in incremental crawling
crawler.server_alias_file Path to optional file containing server alias mappings e.g. www.daff.gov.au=www.affa.gov.au
crawler.sslClientStore Path to a SSL Client certificate store (absolute or relative). Empty/missing means no client certificate store. Certificate stores can be managed by Java's keytool
crawler.sslClientStorePassword Password for the SSL Client certificate store. Empty/missing means no password, and may prevent client certificate validation. Certificate stores can be managed by Java's keytool
crawler.sslTrustEveryone Trust ALL Root Certificates and ignore server hostname verification if true. This bypasses all certificate and server validation by the HTTPS library, so every server and certificate is trusted. It can be used to overcome problems with unresolveable external certificate chains and poor certificates for virtual hosts, but will allow server spoofing.
crawler.sslTrustStore Path to a SSL Trusted Root store (absolute or relative). Empty/missing means use those provided with Java. Certificate stores can be managed by Java's keytool
crawler.start_urls_file Path to a file that contains a list of URLs (one per line) that will be used as the starting point for a crawl. Note that this setting overrides the start_url that the crawler is passed on startup (usually stored in the crawler.start_url configuration option).
crawler.store_all_types If true, override accept/reject rules and crawl and store all file types encountered
crawler.store_empty_content_urls If true, store URLs even if, after filtering, they contain no content.
crawler.store_headers Write HTTP header information at top of HTML files if true. Header information is used by indexer.
crawler.user_agent The browser ID that the crawler uses when making HTTP requests. We use a browser signature so that web servers will return framed content etc. to us.
crawler.use_sitemap_xml Optional parameter specifying whether to process sitemap.xml files during a web crawl.
crawler.verbosity Verbosity level (0-6) of crawler logs. Higher number results in more messages.
crawler The name of the crawler binary.
crawler_binaries Location of the crawler files.
custom.base_template The template used when the collection was created.

D

Option Description
data_report A switch that can be used to enable or disable the data report stage during a collection update.
data_root The directory under which the documents to index reside
datasource Indicates if the collection is a datasource
db.bundle_storage_enabled Allows storage of data extracted from a database in a compressed form.
db.custom_action_java_class Allows a custom java class to modify data extracted from a database before indexing.
db.full_sql_query The SQL query to perform on a database to fetch all records for searching.
db.incremental_sql_query The SQL query to perform to fetch new or changed records from a database.
db.incremental_update_type Allows the selection of different modes for keeping database collections up to date.
db.jdbc_class The name of the Java JDBC driver to connect to a database.
db.jdbc_url The URL specifying database connection parameters such as the server and database name.
db.password The password for connecting to the database.
db.primary_id_column The primary id (unique identifier) column for each database record.
db.xml_root_element The top level element for records extracted from the database.
db.single_item_sql An SQL command for extracting an individual record from the database
db.update_table_name The name of a table in the database which provides a record of all additions, updates and deletes.
db.username The username for connecting to the database.
db.use_column_labels Flag to control whether column labels are used in JDBC calls in the database gatherer
db.use_column_labels Flag to control whether column labels are used in JDBC calls in the database gatherer
directory.context_factory Sets the java class to use for creating directory connections.
directory.domain Sets the domain to use for authentication in a directory collection.
directory.exclude_rules Sets the rules for excluding content from a directory collection.
directory.page_size Sets the number of documents to fetch from the directory in each request.
directory.password Sets the password to use for authentication in a directory collection.
directory.provider_url Sets the URL for accessing the directory in a directory collection.
directory.search_base Sets the base from which content will be gathered in a directory collection.
directory.search_filter Sets the filter for selecting content to gather in a directory collection.
directory.username Sets the username to use for authentication in a directory collection.

E

Option Description
exclude_patterns The crawler will ignore a URL if it matches any of these exclude patterns

F

Option Description
faceted_navigation.date.sort_mode Specify how to sort date based facets.
faceted_navigation.white_list Include only a list of specific values for a facet (Modern UI only).
faceted_navigation.black_list Exclude specific values for a facet (Modern UI only).
filecopy.cache Enable/disable using the live view as a cache directory where pre-filtered text content can be copied from.
filecopy.domain Filecopy sources that require a username to access files will use this setting as a domain for the user.
filecopy.discard_filtering_errors Whether to index or not the file names of files that failed to filter.
filecopy.exclude_pattern Filecopy collections will exclude files which match this regular expression.
filecopy.filetypes The list of filetypes (i.e. file extensions) that will be included by a filecopy collection.
filecopy.include_pattern If specified, filecopy collections will only include files which match this regular expression.
filecopy.max_files_stored If set, this limits the number of documents a filecopy collection with gather when updating.
filecopy.num_workers Number of worker threads for filtering and storing files in a filecopy collection.
filecopy.num_fetchers Number of fetchers threads for interacting with the fileshare in a filecopy collection.
filecopy.walker_class Main class used by the filecopier to walk a file tree
filecopy.passwd Filecopy sources that require a password to access files will use this setting as a password.
filecopy.request_delay Optional parameter to specify how long to delay between copy requests in milliseconds.
filecopy.source This is the file system path or URL that describes the source of data files.
filecopy.security_model Sets the plugin to use to collect security information on files (Early binding Document Level Security).
filecopy.source_list If specified, this option is set to a file which contains a list of other files to copy, rather than using the filecopy.source. NOTE: Specifying this option will cause the filecopy.source to be ignored.
filecopy.store_class Specifies which storage class to be used by a filecopy collection (e.g. WARC, Mirror).
filecopy.user Filecopy sources that require a username to access files will use this setting as a username.
filter Whether to perform post gathering filtering of files such as doc,pdf,ppt,xls to plain text (for indexing). This operation will delete the original files. Do not use with collection_type=local
filter.classes Optionally specify which java classes should be used for filtering documents.
filter.discard_filtering_errors Controls whether files that failed to filter should be deleted or not.
filter.document_fixer.timeout_ms Controls the maximum about of time the document fixer may spend on a document.
filter.ignore.mimeTypes Optional list of MIME types for the filter to ignore
filter.jsoup.classes Specify which java/groovy classes will be used for filtering, and operate on JSoup objects rather than byte streams.
filter.jsoup.undesirable_text-source.* Specify sources of undesirable test strings to detect and present within content auditor.
filter.num_worker_threads Specify number of parallel threads to use in document filtering (text extraction)
filter.text-cleanup.ranges-to-replace Specify Unicode blocks for replacement during filtering (to avoid 'corrupt' character display).
filter.tika_types Specify which file types to filter using the TikaFilterProvider
ftp_passwd Password to use when gathering content from an FTP server.
ftp_user Username to use when gathering content from an FTP server.

G

Option Description
gather The mechanism used to gather documents for indexing. "crawl" indicates Web retrieval whereas "filecopy" indicates a local or remote file copy.
gather.slowdown.days Days on which gathering should be slowed down.
gather.slowdown.hours.from Start hour for slowdown period.
gather.slowdown.hours.to End hour for slowdown period.
gather.slowdown.threads Number of threads to use during slowdown period.
gather.slowdown.request_delay Request delay to use during slowdown period.
groovy.extra_class_path Specify extra class paths to be used by Groovy when using $GROOVY_COMMAND.
gscopes.options Specify options for the "padre-gs" gscopes program.
gscopes.other_bit_number Specifies the gscope bit to set when no other bits are set.

H

Option Description
http_passwd Password used for accessing password protected content during a crawl
http_proxy The hostname (e.g. proxy.company.com) of the HTTP proxy to use during crawling. This hostname should not be prefixed with 'http://'.
http_proxy_passwd The proxy password to be used during crawling
http_proxy_port Port of HTTP proxy used during crawling
http_proxy_user The proxy user name to be used during crawling
http_source_host IP address or hostname used by crawler, on a machine with more than one available
http_user Username used for accessing password-protected content during a crawl

I

Option Description
include_patterns URLs matching this are included in crawl (unless exclude_patterns) e.g. usyd.edu.au, anu.edu.au, www.anutech.com.au/ELC/
index A switch that can be used to enable or disable the indexing stage during a collection update.
indexer The name of the indexer program to be used for this collection.
indexer_options Indexer command line options, each separated by whitespace and thus cannot contain embedded whitespace characters.
indexing.additional-metamap-source.* Declare additional sources of metadata mappings to be used when indexing HTML documents.
indexing.collapse_fields Define which fields to consider for result collapsing
indexing.use_manifest Flag to turn on use of a manifest file for indexing

J

Option Description
java_libraries The path where the Java libraries are located.
java_options Command line options to pass to the Java virtual machine when the crawler is launched.

L

Option Description
logging.hostname_in_filename Control whether hostnames are used in log filenames
logging.ignored_x_forwarded_for_ranges Defines all IP ranges in the X-Forwarded-For header to be ignored by Funnelback when choosing the IP address to Log.

M

Option Description
mail.on_failure_only Whether to always send collection update emails or only when an update fails)
matrix_password Username for logging into Matrix and the Squiz Suite Manager
matrix_username Password for logging into Matrix and the Squiz Suite Manager
max_heap_size Heap size used by Funnelback Java processes (in megabytes)
mcf.authority-url URL for contacting a ManifoldCF authority
mcf.domain Default domain for users in the ManifoldCF authority

N

Option Description
noindex_expression Optional regular expression to specify content that should not be indexed

P

Option Description
post_gather_command Optional command to execute after gathering phase finishes.
post_index_command Command to execute after indexing finishes.
post_update_command Command to execute once an update has finished (update email will already have been sent).
pre_gather_command Command to execute before gathering starts.
pre_index_command Command to execute before indexing commences.
pre_reporting_command Command to execute before reports updating commences.
progress_report_interval Interval (in seconds) at which the gatherer will update the progress message for the Admin UI.
push.auto-start Set if the Push collection will start with the web server.
push.commit-type The type of commit that push should use.
push.init-mode The initial mode in which push should start.
push.max-generations The maximum number of generations push can use.
push.replication.ignore.data When set Query processors will ignore the data, which is used for cached copies.
push.replication.ignore.delete-lists When set Query processors will ignore the delete lists
push.replication.master.host-name A query processor push collection's master's hostname.
push.replication.master.push-api.port The master's push-api port for a query processor push collection.
push.replication.master.webdav.port The master's webdav port for a query processor push collection.
push.scheduler.auto-click-logs-processing-timeout-seconds Number of seconds before a Push collection will automatically trigger processing of click logs.
push.scheduler.auto-commit-timeout-seconds Number of seconds a Push collection should wait before a commit is automatically triggered.
push.scheduler.changes-before-auto-commit Number of changes to a Push collection before a commit is automatically triggered.
push.scheduler.killed-percentage-for-reindex Percentage of killed documents before Push re-indexes.
push.store.always-flush Used to stop a Push collection from performing caching on PUT or DELETE calls.
push.worker-thread-count The number of worker threads Push should use.

Q

Option Description
query_completion Enable or disable query completion.
query_completion.alpha Adjust balance between relevancy and length for query completion suggestions.
query_completion.delay Delay to wait (ms) before triggering query completion.
query_completion.format Set the display format of the suggestions in the search results page.
query_completion.length Minimum length of query term before triggering query completion.
query_completion.program Program to use for query completion.
query_completion.search.enabled Turn in search based query completion.
query_completion.search.program Program to use for search based query completion.
query_completion.show Maximum number of query completion to show.
query_completion.sort Sets the query completion suggestions sort order.
query_completion.source Sets the source of the data for query completion suggestions
query_completion.source.extra Sets extra sources of data for query completion suggestions
query_completion.standard.enabled Enables the standard query completion feature.
query_processor The name of the query processor executable to use.
query_processor_options Query processor command line options.

R

Option Description
recommender Enables/disables the recommendations system
retry_policy.max_tries Maximum number of times to retry an operation that has failed.
rss.copyright Sets the copyright element in the RSS feed
rss.ttl Sets the ttl element in the RSS feed.

S

Option Description
schedule.incremental_crawl_ratio The number of scheduled incremental crawls that are performed between each full crawl (e.g. a value of '10' results in an update schedule consisting of every ten incremental crawls being followed by a full crawl).
search_user Name of user who runs collection updates
security.earlybinding.user-to-key-mapper Selected security plugin for translating usernames into lists of document keys
security.earlybinding.user-to-key-mapper.cache-seconds Number of seconds for which a users's list of keys may be cached
security.earlybinding.user-to-key-mapper.groovy-class Name of a custom Groovy class to use to translate usernames into lists of document keys
security.earlybinding.locks-keys-matcher.name Name of security plugin library that matches user keys with document locks at query time
security.earlybinding.locks-keys-matcher.ldlibrarypath Full path to security plugin library
service_name Name of collection as displayed to users e.g. Intellectual Property Portal. Please note - This is not the same as the Administration Interface concept of services.
service.thumbnail.max-age Specify how long thumbnails may be cached for.
spelling.suggestion_lexicon_weight Specify weighting to be given to suggestions from the lexicon (list of words from indexed documents) relative to other sources (e.g. annotations)
spelling.suggestion_sources Specify sources of information for generating spelling suggestions.
spelling.suggestion_threshold Threshold which controls how suggestions are made.
spelling_enabled Whether to enable spell checking in the search interface (true or false).
start_url Crawler seed URL. Crawler follows links in this page, and then the links of those pages and so on.
store.push.collection Name of a push collection to push content into (if using a PushStore or Push2Store).
store.push.host Hostname of machine where a specified push collection exists (if using a PushStore).
store.push.password The password to use when authenticating against push (if using a PushStore or Push2Store).
store.push.port Port that Push collections listen on (if using a PushStore).
store.push.url The URL that the push api is located at (if using a Push2Store).
store.push.user The user name to use when authenticating against push (if using a PushStore or Push2Store).
store.raw-bytes.class Fully qualified classname of a raw bytes class to use
store.record.type This parameter defines the type of store that Funnelback uses to store its records.
store.temp.class Fully qualified classname of a class to use for temporary storage.
store.xml.class Fully qualified classname of an XML storage class to use
squizapi.target_url URL of the Squiz Suite Manager for a Matrix collection.

T

Option Description
text_miner_enabled Control whether text mining is enabled or not
trim.collect_containers Whether to collect the container of each TRIM records or not (Significantly slows down the crawl)
trim.database The 2-digit identifier of the TRIM database to index
trim.default_live_links Whether search results links should point to a copy of TRIM document, or launch TRIM client.
trim.domain Windows domain for the TRIMPush crawl user
trim.extracted_file_types A list of file extensions that will be extracted from TRIM databases.
trim.filter_timeout Timeout to apply when filtering binaries document
trim.free_space_check_exclude Volume letters to exclude from free space disk check
trim.free_space_threshold Minimal amount of free space on disk under which a TRIMPush crawl will stop
trim.gather_direction Whether to go forward or backward when gathering records.
trim.gather_mode Date field to use when selecting records (registered date or modified date)
trim.gather_start_date The date from which newly registered or modified documents will be gathered.
trim.gather_end_date The date at which stop the gather process.
trim.license_number TRIM license number as found in the TRIM client system information panel.
trim.max_filter_errors The maximum number of filtering errors to tolerate before stopping the crawl
trim.max_size The maximum size of record attachments to process
trim.max_store_errors The maximum number of storage errors to tolerate before stopping the crawl
trim.passwd Password for the TRIMPush crawl user
trim.properties_blacklist List of properties to ignore when extracting TRIM records
trim.push.collection Push collection where to store the extracted TRIM records
trim.request_delay Milliseconds between TRIM requests (for a particular thread)
trim.stats_dump_interval Interval (in seconds) at which statistics will be written to the monitor.log file name
trim.store_class Class to use to store TRIM records
trim.timespan Interval to split the gather date range into
trim.timespan.unit Number of time spans to split the gather date range into
trim.threads Number of simultaneous TRIM database connections to use
trim.user Username for the TRIMPush crawl user
trim.userfields_blacklist List of user fields to ignore when extracting TRIM records
trim.verbose Define how verbose the TRIM crawl is.
trim.version Configure the version of TRIM to be crawled.
trim.web_server_work_path Location of the temporary folder used by TRIM to extract binary files
trim.workgroup_port The port on the TRIM workgroup server to connect to when gathering content from TRIM.
trim.workgroup_server The name of the TRIM workgroup server to connect to when gathering content from TRIM.

U

Option Description
ui.integration_url URL to use to reach the search service, when wrapped inside another system (e.g. CMS)
ui.modern.authentication Enable Windows authentication on the Modern UI
ui.modern.cache.form.content_type Specify a custom content type header for the cache controller file (Modern UI only).
ui.modern.click_link References the URL used to log result clicks (Modern UI only)
ui.modern.content-auditor.collapsing-signature Define how duplicates are detected within Content Auditor.
ui.modern.content-auditor.count_urls Define how deep into URLs Content Auditor users can navigate using facets.
ui.modern.content-auditor.daat_limit Define how many matching results are scanned for creating Content Auditor reports.
ui.modern.content-auditor.date-modified.ok-age-years Define how many years old a document may be before it is considered problematic.
ui.modern.content-auditor.display-metadata.* Define metadata and labels for use displaying result metadata within Content Auditor.
ui.modern.content-auditor.duplicate_num_ranks Define how many results should be considered in detecting duplicates for Content Auditor.
ui.modern.content-auditor.facet-metadata.* Define metadata and labels for use in reporting and drilling down within Content Auditor.
ui.modern.content-auditor.num_ranks Define how many results are displayed in Content Auditor's search results tab.
ui.modern.content-auditor.max-metadata-facet-categories Define the maximum number of categories to display in Content Auditor's facets.
ui.modern.content-auditor.overview-category-count Define how many category values should be displayed on the Content Auditor overview.
ui.modern.content-auditor.reading-grade.lower-ok-limit Define the reading grade below which documents are considered problematic.
ui.modern.content-auditor.reading-grade.upper-ok-limit Define the reading grade above which documents are considered problematic.
ui.modern.cors.allow_origin Sets the value for the CORS allow origin header for Modern UI.
ui.modern.curator.query-parameter-pattern Controls which URL parameters basic curator triggers will trigger against.
ui.modern.extra_searches Configure extra searches to be aggregated with the main result data, when using the Modern UI.
ui.modern.form.content_type Specify a custom content type header for a form file (Modern UI only).
ui.modern.form.headers.count Specify the count of custom headers for a form file (Modern UI only).
ui.modern.form.headers Specify custom headers for a form file (Modern UI only).
ui.modern.freemarker.display_errors Whether to display form files error messages on the browser or not (Modern UI only).
ui.modern.freemarker.error_format Format of form files error messages displayed on the browser (Modern UI only).
ui.modern.geolocation.enabled Enable/disable location detection from user's IP addresses using MaxMind Geolite. (Modern UI only).
ui.modern.geolocation.set_origin Whether the origin point for the search is automatically set if not specified by the user's request. (Modern UI only).
ui.modern.i18n Disable localisation support on the Modern UI.
ui.modern.form.rss.content_type Sets the content type of the RSS template.
ui.modern.search_link Base URL used by search.html to link to itself e.g. the next page of search results. Allows search.html (or a pass-through script) to have a name other than search.html.
ui.modern.serve.filecopy_link References the URL used to serve filecopy documents (Modern UI only)
ui.modern.serve.trim_link_prefix References the prefix to use for the URL used to serve TRIM documents and references (Modern UI only)
ui.modern.session Enable or disable Search session and history
ui.modern.session.timeout Configures the session timeout
ui.modern.session.search_history.size Configures the size of the search and click history
ui.modern.session.search_history.suggest Enable or disable search history suggestions in query completion
ui.modern.session.search_history.suggest.display_template Template to use to display search history suggestions in query completion
ui.modern.session.search_history.suggest.category Category containing the search history suggestions in query completion
ui.modern.session.set_userid_cookie Assign unique IDs to users in an HTTP cookie
ui.modern.metadata-alias.* Creates aliases for metadata class names.
ui_cache_disabled Disable the cache controller from accessing any cached documents.
ui_cache_link Base URL used by PADRE to link to the cached copy of a search result. Can be an absolute URL.
update.restrict_to_host Specify that collection updates should be restricted to only run on a specific host.
userid_to_log Controls how logging of IP addresses is performed.

V

Option Description
vital_servers Changeover only happens if vital_servers exist in the new crawl.

W

Option Description
warc.compression Control how content is compressed in a WARC file.
wcag.archive_databases Controls archiving of accessibility check databases.
wcag.check Turns accessibility checks on or off.
wcag.days_between_runs Reduces the frequency of accessibility checks.
wcag.portfolio_metadata_class Metadata class for portfolio information.
workflow.publish_hook Name of the publish hook Perl script
workflow.publish_hook.meta Name of the publish hook Perl script that will be called each time a meta collection is modified

See also