Update task scheduler

Funnelback supports automatic scheduling of data source and analytics updates.

The scheduler supports two different update schedules - with updates based on elapsed time since the previous update and updates set to commence at a specified time.

Defining an update schedule

An update schedule is initiated from the details screen of the data source or search package that you wish to update. Once on the details screen click on schedule an update from the scheduled tasks section.

When you set up an update schedule you will be guided through the two or three steps required to schedule your update. The steps are:

Task type

This indicates the type of update to run. This is only required when updating a data source.

The available types will depend on what type of data source is being updated. However, you will almost always want to schedule the default standard update which is the same as manually clicking the update button for your data source. Other available update types are listed beneath the advanced options menu.

Task interval

This is a choice between a regular update interval (e.g. every 24 hours) or to start an update at a fixed time (if resources are available).

Task schedule

This defines how often, or when an update should be run.

The update will be added to the task queue at the configured start update time. The update may not start immediately if insufficient server resources are available.

Task types

Push data sources are never scheduled to be updated as their updates are via the push API.

The selected task type indicates the type of update that should be scheduled for a data source.

Available types are:

Standard update

This updates using a full/incremental update scheme which runs mostly incremental updates, but will run a full update for every 10 incremental updates run. This can be adjusted by setting the schedule.incremental_crawl_ratio in your data source configuration.

Full update

This runs a full data source update. Previously gathered data is discarded and everything is gathered from the target data source.

Incremental update

(web, filecopy and database data sources only) This runs an incremental data source update. Previously gathered data is copied from a local cache if unchanged (based on heuristics such as last modified time and file size). This reduces the time required to gather content as unchanged content is copied from cache instead of being downloaded.

Refresh update

(web data source only) This runs a refresh data source update. New data is gathered and adds to or replaces previously gathered content. This will refresh updated content but will not remove anything that has been deleted from the source data. This option is great for refreshing important content in a very large index.

Reindex live view

This rebuilds the live index. It is designed to apply new metadata mappings to the search index and generally should never need to be scheduled.

Reapply gscopes to live view

This rebuilds the gscopes index. It is designed to apply new gscopes to the search index and generally should never need to be scheduled.

Task interval

This is a choice between running the update on a regular update interval (e.g. every 24 hours) or to start an update at a fixed time (if resources are available).

Update at regular interval

An update that takes place a specified number of hours after the last successful update. A failed update will retry on an increasing interval.

Update at a fixed time

An update that attempts to run at a fixed time on specified days. If the update fails, it will wait until the next scheduled time to run again.

Task schedule

For updates at a regular interval:

  • you can specify the interval as a number of hours, days or months.

  • you can specify a do not disturb time window as a local start time and number of hours. e.g. do not run an update in the time period starting at 8pm local time, for a period of 8 hours. Any updates that schedule to run within this period will be skipped.

For fixed time updates you specify:

  • the desired update start time.

  • the days of the week on which the update should run.

Managing scheduled updates

There are two ways to manage scheduled updates:

All data sources and search packages

To view all of the scheduled updates select scheduled tasks from the tasks section of the main navigation.

A table showing all the scheduled updates will load with options to edit or delete individual schedules.

Current data source or search package

To manage the scheduled updates for the current data source or search package select view scheduled tasks from the scheduled tasks section of the data source or search package details screen. A table of all scheduled updates for the current data source or search package will load providing options to edit or delete individual schedules.

Task scheduler configuration options

The configuration options below are set in your data source or search package configuration when you configure an update schedule.

Data source, search package (collection) options

The following options are set in the data source or search package configuration.

schedule.[taskType].auto.desired-time-between-updates

Specifies the desired time between tasks of the given type running for this collection.

schedule.[taskType].auto.no-update-window.duration

Specifies the duration of a window during which tasks of the given type will not be automatically scheduled.

schedule.[taskType].auto.no-update-window.start-time

Specifies the start time time of a window during which tasks of the given type will not be automatically scheduled.

schedule.[taskType].fixed.permitted-days-of-week

Specifies a set of days of the week on which fixed start-time tasks for the given type will be automatically scheduled. Default: SUNDAY,MONDAY,TUESDAY,WEDNESDAY,THURSDAY,FRIDAY,SATURDAY.

schedule.[taskType].fixed.start-times

Specifies a set of times at which tasks of the given type will be automatically scheduled. schedule.timezone: The timezone in which the configured times are in, by default this is UTC.

Server options

The following options are set in the server configuration.

scheduler.paused

Can be used to pause all updates on a Funnelback server.

© 2015- Squiz Pty Ltd