Upgrading workflow commands

Workflow enabled Funnelback to run arbitrary shell commands between each step in the update cycle.

This is not permitted in the DXP as it is a security risk, and it also prevents automatic upgrades of the search.

This guide outlines the process you need to follow when upgrading workflow.

High level process

The key to successfully upgrading workflow is to break down the different unique tasks that the workflow is performing. When doing this you often need to look at the configuration holistically - because workflow logic is often dependent on both other collection configuration, and also on how the tasks span different workflow commands.

The identified tasks should be performing discrete operations that can then be replaced with other product functionality.

Replacing workflow functionality

Once you have broken down all the workflow functionality into a set of discrete tasks, you then need to figure out how this can be done in the DXP without any custom coding.

The high-level options to replace workflow commands tasks are:

  • Existing plugins and filters: these implement commonly occurring tasks that were previous implemented as workflow - e.g. generating auto-completion, transforming XML. Become familiar with the plugins that are available and the functions they perform.

  • Other built-in functionality: workflow often replicate behavior that is available through built-in functionality - e.g. applying/generating kill lists and configuration.

Common patterns and their replacements

Downloading content for indexing

Typical source: pre-gather or pre-index script

Example: shell command/bash script
curl 'https://example.com/feed.xml' > $SEARCH_HOME/data/$COLLLECTION_NAME/offline/data/example.xml

Remediation: replace this with configuration of the web crawler/web data source to fetch the content.

Downloading external metadata feeds

Typical source: pre-gather or pre-index script

Example: shell command/bash script
curl 'https://example.com/emfeed.txt' > $SEARCH_HOME/conf/$COLLLECTION_NAME/external_metadata.cfg

Remediation: replace this with the external metadata fetcher plugin.

Applying gscopes, QIE or kill configuration

Typical source: post-index script

Example: shell command/bash script
# Apply gscopes to index
$SEARCH_HOME/bin/padre-gs $SEARCH_HOME/$COLLECTION_NAME/data/$CURRENT_VIEW/idx/index $SEARCH_HOME/conf/$COLLECTION_NAME/gscopes.cfg

# Apply QIE to index
$SEARCH_HOME/bin/padre-qi $SEARCH_HOME/$COLLECTION_NAME/data/$CURRENT_VIEW/idx/index $SEARCH_HOME/conf/$COLLECTION_NAME/qie.cfg 0.5

# Apply kill configuration to index
$SEARCH_HOME/bin/padre-fl $SEARCH_HOME/$COLLECTION_NAME/data/$CURRENT_VIEW/idx/index $SEARCH_HOME/conf/$COLLECTION_NAME/kill.cfg -exactmatch -kill

Remediation: Ensure gscopes/qie/kill configuration is setup using the standard configuration files and remove the workflow (rules in standard config are applied automatically).

Generating auto-completion from the search index

Typical source: post-index script

Example: shell command/bash script
$SEARCH_HOME/conf/$COLLECTION_NAME/@workflow/post_index.sh -c $COLLECTION_NAME -v $CURRENT_VIEW -p auto-completion

Remediation: replace this with the auto-completion plugin.

Downloading an auto-completion CSV from an external source

Typical source:post-index script

Example: shell command/bash script
curl 'https://example.com/autoc.csv' > $SEARCH_HOME/conf/$COLLLECTION_NAME/_default/auto-completion.csv

Remediation: replace this by setting the auto-completion.source.csv.[name].url configuration key.