Download configuration files via workflow
Background
There is often a need to download configuration files as part of the Funnelback workflow. This can involve downloading an external file (such as an external_metdata.cfg
from Squiz Matrix), or accessing Funnelback to produce a configuration file such as auto-completion.csv
.
Procedure
curl
is the recommended program to use when downloading via workflow. Please note this is preferred to using wget
or other custom perl or python scripts that use internal libraries.
The commands are commonly called in workflow from a bash script or Windows batch file.
e.g. pre_gather.sh
or pre_gather.bat
:
The following curl
command should be used when downloading as part of workflow. This command sets a longer timeout as well as retries, and exits if there is an error.
pre_gather.sh
# Linux - use single quotes
curl --connect-timeout 60 --retry 3 --retry-delay 20 '<URL-TO-DOWNLOAD>' -o <OUTPUT-FILE> || exit 1
pre_gather.bat
REM Windows - use double quotes (requires cygwin)
c:\cygwin\bin\curl.exe --connect-timeout 60 --retry 3 --retry-delay 20 "<URL-TO-DOWNLOAD>" -o <OUTPUT-FILE> || exit 1
REM Windows - with wget.exe Note: only use this if curl is unavailable, or you are having problems with the curl commend
c:\funnelback\wbin\wget.exe -T 60 -t 3 -w 20 "<URL-TO-DOWNLOAD>" -O <OUTPUT-FILE> || exit 1
Note:
-
Funnelback under Windows ships with a
wget
binary that can be called - it’s located at\winbin\wget.exe
-
If required under Windows
curl
can be used by installing Cygwin and using thecurl
binary that is part of that product (note the native Windows/DOS version ofcurl
is too old and doesn’t support a lot of options). -
curl
support various forms of authentication (eg. http. Windows integrated) which is sometimes required when downloading files. -
When downloading from a local Funnelback instance use the localhost address e.g.
http://localhost/s/search.html?collection=<COLLECTION-ID>&query=<QUERY>
(as opposed to a fully qualified address such ashttp://<FUNNELBACK-SERVER>.com/s/search.html?collection=<COLLECTION-ID>&query=<QUERY>
. Note the ports should be adjusted to be whatever the Funnelback http port is. -
If you are downloading external metadata please make use of the external metadata validator that is built in to Mediator.
-
wget
command is equivalent tocurl
, butcurl
is preferred for consistency (and also because has options for session cookies etc). Some versions of cygwin have a bug withcurl.exe
that results in errors like:cygwin curl.exe: *** fatal error - couldn't initialize fd 0 /dev/cons0
If this is happening then fall back to the
wget
command.