Write a workflow script on Linux using bash

Background

This article describes how to write a workflow script on Linux using bash.

Process

  1. Create a @workflow folder within the collection’s conf folder. This folder should be used to save your workflow scripts. Skip this if the folder already exists.

    mkdir -p $SEARCH_HOME/conf/examplecollection/@workflow

  2. Create your workflow script within this folder. Use the bash script template below as a starting point for creating your bash workflow script.

    Save the file with a filename that matches the workflow phase e.g. pre_gather.sh, post_index.sh.
  3. Change the script permissions to ensure it is executable.

    chmod 755 $SEARCH_HOME/conf/examplecollection/@workflow/<FILE-NAME>

  4. Add the script to your collection.cfg to ensure it is run as part of the collection update. Add the command to the appropriate workflow phase. e.g. for a pre index workflow script add something similar to the following to collection.cfg:

    pre_index_command=$SEARCH_HOME/conf/$COLLECTION_NAME/@workflow/pre_index.sh -c $COLLECTION_NAME -g $GROOVY_COMMAND -v $CURRENT_VIEW

Bash workflow tips

The following tips will ensure you minimise any issues with your scripts:

  • Use the template below and pass in the appropriate collection variables to ensure that these can be used within the script.

  • Any command that interacts with the index must access the correct index view. It is important to understand which view (live or offline) is the correct view to interact with and this is dependent on where in the update the command runs and the context of what the command is executing. Sometimes the view will be context dependent and you should use the $CURRENT_VIEW variable. In other context you may need to always access the live view.

  • Ensure any shell commands that are run from the script are as error tolerant as possible (e.g. make use of any options that will assist with this such as timeouts).

  • Ensure any errors from shell commands that are run from the script are handled appropriately.

Ensure that your script does not include any destructive system commands.

Bash script template

#!/bin/bash
# Description: Funnelback bash workflow script tempate
# Author:
# Version:

# Usage:

# Allow the collection.cfg variables to be passed in and made available within this script.
# Pass the variables in as -c $COLLECTION_NAME -g $GROOVY_COMMAND -v $CURRENT_VIEW
#
# This enables use of $COLLECTION_NAME, $CURRENT_VIEW, $GROOVY_COMMAND as well as $SEARCH_HOME which is set witin the environment.
#
# Add additional parameters using the same format as below - an additional value in the getopts string and additional case.
#
# Remote any parameters that are not required in the script.

while getopts ":c:g:v:" opt; do
  case $opt in
    c) COLLECTION_NAME="$OPTARG"
    ;;
    g) GROOVY_COMMAND="$OPTARG"
    ;;
    v) CURRENT_VIEW="$OPTARG"
    ;;
    \?) echo "Invalid option -$OPTARG" >&2
    ;;
  esac
done

# Check SEARCH_HOME is defined and a folder
[ "${SEARCH_HOME}" ] || { echo -e "\n$0: Error: \$SEARCH_HOME environment variable is not defined, cannot continue.\n" >&2; exit 1; }
[ -d "${SEARCH_HOME}" ] || { echo -e "\n$0: Error: \$SEARCH_HOME folder does not exist, cannot continue.\n" >&2; exit 1; }

# Add your workflow commands below

# e.g. download a file and save it to your collection's web resources folder
# curl --connect-timeout 60 --retry 3 --retry-delay 20 'http://example.com/resources/file.txt' -o ${SEARCH_HOME}/conf/${COLLECTION_NAME}/_default/web/file.txt || exit 1