Built-in filters - Run workflow filter rules (WorkflowFilter)
This feature is not available in the Squiz DXP. |
This feature is deprecated and will be removed in a future version. Please update any existing implementations to use supported features. |
The scripted workflow filter allows conditions and actions that can be executed during content filtering to be defined.
Configuring the scripted workflow filter
Enabling
-
Edit the
filter.classes
parameter in your collectino configuration and add the following string to the endcom.funnelback.common.filter.WorkflowFilter
.Example
filter.classes=TikaFilterProvider,ExternalFilterProvider:DocumentFixerFilterProvider:com.funnelback.common.filter.WorkflowFilter
-
Create a
workflow.cfg
file using the Configuration file manager. This file will contain the conditions and actions you wish to define.
Configuring scripted workflow rules
The workflow.cfg
contains Groovy code consisting of a number of if statements that perform a specified action.
Syntax
The [CONDITION]
and [ACTION]
values in the syntax examples below should be replaced with valid conditions and actions (listed in sections below).
The syntax for each workflow command is as follows:
if ([CONDITION]) {
[ACTION]
}
Statements can be nested
if ([CONDITION1]) {
if [CONDITION2] {
[ACTION]
}
}
Conditions can be combined using and
and or
commands:
if (([CONDITION1]).and([CONDITION2])) {
[ACTION1]
}
if (([CONDITION3]).or([CONDITION4])) {
[ACTION2]
}
Variables can be defined using the def
keyword.
groovy
def pubs = urlContains("publications"); if (publications == true) { [ACTION] }
Conditions
Function | Description |
---|---|
|
Returns true if URL contains given regular expression, false otherwise. |
|
Returns true if URL does not contain given regular expression, false otherwise. |
|
Returns true if URL starts with the given regular expression, false otherwise. |
|
Returns true if URL does not start with the given regular expression, false otherwise. |
|
Returns true if URL ends with the given regular expression, false otherwise. |
|
Returns true if URL does not end with the given regular expression, false otherwise. |
|
Returns true if content contains the given regular expression, false otherwise. |
|
Returns true if content does not contain the given regular expression, false otherwise. |
|
Returns true if content starts with the given regular expression, false otherwise. |
|
Returns true if content does not start with the given regular expression, false otherwise. |
|
Returns true if content ends with the given regular expression, false otherwise. |
|
Returns true if content does not end with the given regular expression, false otherwise. |
Actions
Function | Description |
---|---|
|
Modifies the document content by looking for all matches for the given regular expression and replacing them with the given replacement text. |
|
Returns the first matching section of the document content that matches the given regular expression. |
|
Insert a meta tag with the given name and content values into the document. |
Examples
This section gives some examples of the script language that might be put in the workflow.cfg
file.
if ((contentContains("(?i)ovum")).or(contentContains("Gartner"))) {
if (urlContains("analyst-reviews")) {
insertMetaTag("robots", "noindex");
}
}
In the example above the content must contain either Ovum or Gartner and the URL must contain analyst-reviews. The (?i)
syntax means to use a case-insensitive match. If these conditions are met then a robots noindex meta tag will be inserted into the content, meaning that the document will not be indexed.
// Example of extraction of content for re-insertion
if ((urlContains("funnelback")).and(urlDoesNotStartWith("test")).and(contentContains("\\w+")).and(urlEndsWith(".pdf"))) {
def matched = getMatchingContent("original(.*?)text");
replaceContent "original(.*?)text", "replaced text: middle was [" + matched + "]"
}
In this second example we are extracting content for re-insertion. The def
keyword is used to define a variable in the scripting language we use (Groovy).
// Example of title replacement
if ((urlContains("amazon")).or(urlDoesNotStartWith("test"))) {
replaceContent "<title>(.*?)
</title>", "
<title>New Title
</title>"
}
Here we are inserting a new title into the content using the replaceContent
action, which takes a regular expression to match with and then some replacement text.
// Example of extracting content and inserting into metadata
if (urlEndsWith(".pdf")) {
def matched = getMatchingContent("middle(.*?)content");
if (matched != "") {
insertMetaTag("my_meta_data", matched);
}
}
In this last example we extract some matching content and insert it as meta data. It will be inserted into the "…" section of the document if it has one, or after the opening tag otherwise.