Update task scheduler
Funnelback supports automatic scheduling of data source and analytics updates.
The scheduler supports two different update schedules - with updates based on elapsed time since the previous update and updates set to commence at a specified time.
Update based on time since previous update
The scheduling for a data source is set in the data source configuration, for example:
Configuration key | Value |
---|---|
schedule.timezone | Australia/ACT |
schedule.[task-type].auto.desired-time-between-updates | PT24H |
schedule.[task-type].auto.no-update-window.start-time | 09:00:00 |
schedule.[task-type].auto.no-update-window.duration | PT8H |
[task-type]
must be replaced by one of the supported types which are listed in the next section.
This configures the data source to
-
update once every 24 hours. The value is specified using ISO-8601 duration format. See: examples of the duration string format
-
prevent updates from starting between 9am and 5pm in the
Australia/ACT
timezone. The timezone value is specified as a Java timezone ID.
Note:
-
The scheduler will not account for any delay introduced by a task picker which may decide to start the update only when resources are available on the server.
-
Updates which fail will be retried after an additional delay (currently hardcoded at
6 hours * number of failed updates since the last success
(the number of failures considered is capped to ten).
Update at a fixed time
The scheduling for a data source is set in the data source configuration, for example:
Configuration key | Value |
---|---|
schedule.timezone | Australia/ACT |
schedule.[task-type].fixed.start-times | 19:30:00,20:00:00 |
schedule.[task-type].fixed.permitted-days-of-week | MONDAY,TUESDAY,WEDNESDAY,THURSDAY,FRIDAY |
[task-type]
must be replaced by one of the supported types which are listed in the next section.
This configures the collection to update at 19:30 (i.e. 7:30pm) and at 20:00 (i.e 8.00pm) in the Australia/ACT
timezone on weekdays.
Notes:
-
A value of
ANY
is permitted in place of the comma-separated list of week days to indicate no week-day restriction. -
Be aware of daylight savings changes. Scheduling an update between 2am and 3am in a timezone which has daylight savings changes around that time may prove confusing to you (updates would be skipped or run twice in an hour), so you might want to avoid doing that.
-
It is permitted, though it won’t usually be too useful, to give one collection both a
schedule.[task-type].fixed.start-times
and aschedule.[task-type].auto.desired-time-between-updates
.
Supported task types
Currently we support the following list of task types in the update scheduler.
-
full-update
: Run a full update of a data source (all data source types except push). -
incremental-update
: Run an incremental update of a data source (web/database data sources only) -
normal-update
: Run a normal update of a data source. This applies the incremental crawl ratio for web data sources (all data source types except push). -
reapply-gscopes-to-live-index
: re-applies gscopes to a live index without the need to re-index or run a full update (all data source types except push). -
rebuild-live-index
: Rebuilds the live index for a data source without the need to run a full update (all data source types except push). -
refresh-update
: Runs a refresh update for a web data source. (Web data source only). -
update-analytics
: Runs an incremental update of the analytics for a search. (Search packages only).
Example configuration for scheduling a full update:
Configuration key | Value |
---|---|
schedule.timezone | Australia/ACT |
schedule.full-update.fixed.start-times | 19:30:00,20:00:00 |
schedule.full-update.fixed.permitted-days-of-week | MONDAY,TUESDAY,WEDNESDAY,THURSDAY,FRIDAY |
Pausing the scheduler
The scheduler can be paused by setting the server configuration option scheduler.paused to true
on the Funnelback server that is responsible for running the updates. This will prevent any new updates being scheduled until the scheduler is un-paused (by either removing this setting, or setting it to false
.
if a fixed update time passes while the scheduler is paused it will not be updated when the scheduler is un-paused but will run at the next scheduled time. |
Task scheduler configuration options
Collection options
The following options are set in the data source or search package configuration.
-
schedule.[taskType].auto.desired-time-between-updates: Specifies the desired time between tasks of the given type running for this collection.
-
schedule.[taskType].auto.no-update-window.duration: Specifies the duration of a window during which tasks of the given type will not be automatically scheduled.
-
schedule.[taskType].auto.no-update-window.start-time: Specifies the start time time of a window during which tasks of the given type will not be automatically scheduled.
-
schedule.[taskType].fixed.permitted-days-of-week: Specifies a set of days of the week on which fixed start-time tasks for the given type will be automatically scheduled. Default:
SUNDAY,MONDAY,TUESDAY,WEDNESDAY,THURSDAY,FRIDAY,SATURDAY
. -
schedule.[taskType].fixed.start-times: Specifies a set of times at which tasks of the given type will be automatically scheduled.
-
schedule.timezone: The timezone in which the configured times are in, by default this is UTC.
Server options
The following options are set in the server configuration.
-
scheduler.paused: Can be used to pause all updates on a Funnelback server.