Filter example - read a configuration file

This example shows how to read configuration options contained in a custom configuration file from a filter.

The filter used in this example is the source code for v1.0.0 of the Add extra metadata to content plugin provided in the SXC.

Custom configuration files

The plugin framework supports use of a custom configuration file specified as either a simple .txt file, or as a structured .json file.

This example uses a JSON file to capture a series of rules used to apply metadata to documents.

Step 1: Define the JSON structure

The JSON format below is defined to capture a metadata rule:

[
  {
    "name": "<rule name>",
    "description": "<rule description>",
    "patternType": "<rule pattern type>",
    "pattern": "<URL pattern>",
    "metadata": {
      "<metadata name1>": "<metadata value1>",
      "<metadata name2": "<metadata value2>"
    }
  }
]

We will support a JSON file that contains one or more of these rules.

Step 2: Define the configuration file name

Add public static final String PLUGIN_CONFIG_FILE = "external-metadata.json"; to the PluginUtils.java to define the name of the configuration file that the plugin will load:

PluginUtils.java
package com.funnelback.plugin.addmetadatatourl;

public class PluginUtils {

    public static final String PLUGIN_NAME = "add-metadata-to-url";

    public static final String PLUGIN_CONFIG_FILE = "external-metadata.json" ; (1)
}
1 Sets the name of the configuration file we will load.

Step 3: Define a Java class to capture a configuration item

Once you know what fields are going to be captured in the configuration file create a Java class to capture the fields of the rule. We will call our class AddMetadataToUrlRule:

AddMetadataToUrlRule.java
package com.funnelback.plugin.addmetadatatourl;

import lombok.*;

import java.util.HashMap;
import java.util.Map;

@NoArgsConstructor
@AllArgsConstructor
@ToString
public class AddMetadataToUrlRule {

    enum PatternType {
        REGEX_PATTERN, LEFT_MATCH, SUBSTRING (1)
    }

    @Getter @Setter private String description; (2)
    @Getter @Setter private String name;
    @Getter @Setter private String pattern;
    @Getter @Setter private PatternType patternType;
    @Getter @Setter private Map <String, String> metadata = new HashMap<String, String>();
}
1 Defines a set of allowed values that are accepted by the patternType field
2 Sets up variables of appropriate types to hold the configuration contained in each rule.

Step 4: Write Java code into the filter to read the configuration file

Write logic into the main filter class for the plugin to open the configuration file and read all of the configuration into the object defined in step 3.

Here we add logic to the PreFilterCheck method to check for the existence of the configuration file as a pre-condition to running the filter.

If the file exists, we parse the file into a JSON object.

When the filter processes a document we parse the configuration and populate a variable

AddMetadataToUrlStringFilter.java
package com.funnelback.plugin.addmetadatatourl;

import com.fasterxml.jackson.core.JsonFactory;
import com.fasterxml.jackson.core.JsonParser;
import com.fasterxml.jackson.core.JsonToken;
import com.fasterxml.jackson.databind.ObjectMapper;
import com.funnelback.filter.api.FilterContext;
import com.funnelback.filter.api.FilterResult;
import com.funnelback.filter.api.documents.NoContentDocument;
import com.funnelback.filter.api.documents.StringDocument;
import com.funnelback.filter.api.filters.PreFilterCheck;
import com.funnelback.filter.api.filters.StringDocumentFilter;
import java.io.IOException;
import java.util.Optional;
import java.util.function.Function;
import java.util.regex.Pattern;

import com.google.common.collect.ListMultimap;
import org.apache.logging.log4j.LogManager;
import org.apache.logging.log4j.Logger;

public class AddMetadataToUrlStringFilter implements StringDocumentFilter {
    private static final Logger log = LogManager.getLogger(AddMetadataToUrlStringFilter.class);
    private JsonParser parser;

    @Override
    public PreFilterCheck canFilter(NoContentDocument noContentDocument, FilterContext filterContext) {
        Optional<String> config = filterContext.pluginConfigurationFile(PluginUtils.PLUGIN_CONFIG_FILE); (1)
        if (config.isEmpty()) { (2)
            log.warn("No '{}' plugin configuration file is provided so the plugin will not run.", PluginUtils.PLUGIN_CONFIG_FILE);
            return PreFilterCheck.SKIP_FILTER;
        } else {
            try {
                parser = new JsonFactory().createParser(config.get()); (3)
                parser.setCodec(new ObjectMapper());
                if (!parser.nextToken().equals(JsonToken.START_ARRAY)) {
                    throw new IllegalArgumentException("Plugin configuration file expects to have a top-level array");
                }
            } catch (IOException e) {
                throw new IllegalArgumentException("Plugin configuration file couldn't be processed", e);
            }
        }
        return PreFilterCheck.ATTEMPT_FILTER;
    }

    @Override
    public FilterResult filterAsStringDocument(StringDocument stringDocument, FilterContext filterContext) {
        ListMultimap<String, String> metadata = stringDocument.getCopyOfMetadata();
        try {
            while(parser.nextToken().equals(JsonToken.START_OBJECT)) {
                AddMetadataToUrlRule rule = parser.readValueAs(AddMetadataToUrlRule.class); (4)
                applyRule(rule, stringDocument.getURI().toString(), metadata);  (5)
            }
        } catch (IOException e) {
            throw new RuntimeException("Failed to read next rule from configuration file: " + e.getMessage());
        }
        return FilterResult.of(stringDocument.cloneWithMetadata(metadata));
    }

    private void applyRule(AddMetadataToUrlRule rule, String url, ListMultimap<String, String> metadata) {
        Function<Boolean, Boolean> setMetadata = toApply -> {
            if (toApply) {
                rule.getMetadata().forEach((k, v) -> {
                    metadata.put(k, v);
                    log.debug("Add extra metadata '{}: {}' from rule '{}'", k, v , rule.getName());
                });
                return true;
            }
            log.debug("URL {} hasn't matched provided pattern for rule '{}'", url, rule.getName());
            return false;
        };

        log.debug("Apply rule: {} to URL {}", rule.toString(), url);
        switch (rule.getPatternType()) {
            case LEFT_MATCH: setMetadata.apply(url.startsWith(rule.getPattern())); break;
            case SUBSTRING: setMetadata.apply(url.contains(rule.getPattern())); break;
            case REGEX_PATTERN: {
                Pattern urlPattern = Pattern.compile(rule.getPattern(), Pattern.CASE_INSENSITIVE);
                setMetadata.apply(urlPattern.matcher(url).matches());
                break;
            }
            default: log.debug("Invalid pattern type was provided. Expected values are: {}", AddMetadataToUrlRule.PatternType.values());
        }
    }
}
1 Loads the configuration file name from the filterContext using the configuration file name defined in PluginUtils.
2 Only run the filter if the configuration file exists and is non-empty
3 Check that the JSON is valid.
4 Parse the JSON and read it in to the AddMetadataToUrlRule object.
5 Applies the metadata rule to the document.