Jsoup Filter example - read data source configuration options

Basic example - reading standalone keys

This example shows how to define and load a couple of standalone configuration keys into a jsoup filter. The keys we’ll be processing are defined in the data source configuration:

  • plugin.plugin-examples.config.debug - a boolean argument that can be set to true or false.

  • plugin.plugin-examples.config.library - a string containing the name of a library.

Configuration key prefixes

When writing a plugin all configuration keys you define must have a common prefix that follows the format:

plugin.<PLUGIN-NAME>.config

This prefix is defined in the PluginUtils.java file that is generated when the plugin is created.

Reading the configuration keys

Plugin configuration should be read in the setup() method of the jsoup filter.

This ensures that the configuration is only loaded once, at the time the plugin is initialized.

A private variable should be set up for each configuration item that is being loaded. Additional imports may be required depending on your variable types.

e.g. in the example below we need an additional import because we are using Optional.ofNullable to set a default value for our string variable.

package com.example.pluginexamples;

import java.util.Optional; (1)

public class JsoupFilterReadConfigKeys implements IJSoupFilter {

    private static final Logger log = LogManager.getLogger(JsoupFilterReadConfigKeys.class);

    private SetupContext setup;
    private Boolean debug; (2)
    private String library;

    public void setup(SetupContext setup) {
        this.setup = setup;

        // Read debug flag (DEF: false)
        this.debug = Boolean.parseBoolean(this.setup.getConfigSetting(PluginUtils.KEY_PREFIX+".debug")); (3)

        this.library = Optional.ofNullable(this.setup.getConfigSetting(PluginUtils.KEY_PREFIX+".library")).orElse("British Library"); (4)
    }
1 Import required by Optional.ofNullable
2 Private variables that will be used to hold the configuration values.
3 Demonstrates how to read in a boolean variable, with a default value of false. The default is defined by the implementation of the Boolean.parseBoolean() method.
4 Demonstrates how to read in a string variable and set a default value.

Accessing the configuration from your filter logic

The private variables can be accessed directly from the methods within your filter class.

e.g. in the jsoup filter’s processDocument() method below.

    @Override
    public void processDocument(FilterContext filterContext) {
        // This code is executed for each document
        log.info ("Processing document");

        if (debug)  (1)
            log.debug("Debug mode is enabled");

            log.debug("Library: "+library); (2)
        }
1 the debug variable is used for a conditional.
2 the value of the library variable is printed.

Example - Grouped configuration keys

It is not uncommon to define a set of configuration keys that must be grouped by a common ID.

In this example we will extend the basic example above to add additional configuration keys that define books. Each book will have two configuration keys defined, capturing the title and author.

The keys will have a common prefix plugin.plugin-examples.config.book., and be grouped by a common ISBN.

  • plugin.plugin-examples.config.book.<ISBN>.title - the title of the book. e.g. plugin.plugin-examples.config.book.237411512X.title=Writing plugins

  • plugin.plugin-examples.config.book.<ISBN>.author - the author of the book. e.g. plugin.plugin-examples.config.book.237411512X.author=Jonathan Doe

Configuration key prefixes

To assist with the configuration loading we’ll define an additional prefix within the plugin utilities that can be used to select only the book keys.

Jsoup filter - plugin utils
package com.example.pluginexamples;

public class PluginUtils {

    public static final String PLUGIN_NAME = "plugin-examples"; (1)

    public static final String KEY_PREFIX = "plugin." + PLUGIN_NAME + ".config." ; (2)

    public static final String CONFIG_BOOK_PREFIX = KEY_PREFIX + ".book."; (3)
}
1 This variable contains the plugin’s group ID, and is populated automatically when the plugin is generated.
2 This variable contains the configuration key prefix that should be applied to all plugin configuration keys. This value is populated automatically when the plugin is generated.
3 This variable defines a prefix that will allow selection of a subset of the plugin’s configuration keys. This is a user-defined value specific to this plugin.

Helper class to capture the book configuration

In order to group the keys together, we need to define another class that captures the linked properties.

To do this we’ll define a Book class. This book class needs to have two variables corresponding to the title and author, as well as capturing the ISBN.

Create a new file, Book.java in the same folder as your plugin filter, containing the following code.

Book.java
package com.example.pluginexamples;

import lombok.AllArgsConstructor; (1)
import lombok.Getter;
import lombok.Setter;

@AllArgsConstructor
public class Book {
/* Defines a configuration object to collection various configuration relating to a book */
@Getter @Setter private String isbn; (2)
@Getter @Setter private String title;
@Getter @Setter private String author;
}
1 This class relies on lombok to provide methods for accessing the variables. This needs to be added as a dependency on the project (which will add an entry to the project’s pom.xml.
2 Private variables should be defined for each of the configuration items that will be captured. Choose appropriate types for the variables.

Reading the configuration keys

As for the basic example, the configuration keys must be read in the setup() method of the filter.

In this example we extract the keys that start with plugin.plugin-examples.config.book., using the prefix we defined in the plugin utils.

The values are then added to a variable that is a list of type Book, which you defined in the previous step.

Jsoup filter - main filter class
import java.util.List; (1)
import java.util.Optional;
import java.util.stream.Collectors;

/**
 * Demonstrates a jsoup plugin that reads some configuration options from the data source configuration.
 */
public class JsoupFilterReadConfigKeys implements IJSoupFilter {

    private static final Logger log = LogManager.getLogger(JsoupFilterReadConfigKeys.class);

    private SetupContext setup;
    private Boolean debug;
    private String library;
    private List<Book> configEntries; (2)

    public void setup(SetupContext setup) {
        this.setup = setup;
        this.debug = Boolean.parseBoolean(this.setup.getConfigSetting(PluginUtils.KEY_PREFIX+".debug"));
        this.library = Optional.ofNullable(this.setup.getConfigSetting(PluginUtils.KEY_PREFIX+".library")).orElse("British Library");

        // Check each collection.cfg setting and load the settings into an object
        configEntries = this.setup.getConfigKeysWithPrefix(PluginUtils.CONFIG_BOOK_PREFIX).stream() (3)
                .map(key -> {
                    String isbn = key.substring(PluginUtils.CONFIG_BOOK_PREFIX.length() + 1).replace(".title","").replace(".author",""); (4)
                    String title = Optional.ofNullable(this.setup.getConfigSetting(PluginUtils.CONFIG_BOOK_PREFIX + "." + isbn + ".title")).orElse("Unknown title"); (5)
                    String author = Optional.ofNullable(this.setup.getConfigSetting(PluginUtils.CONFIG_BOOK_PREFIX + "." + isbn + ".author")).orElse("Unknown author"); (6)
                    log.debug("isbn: "+isbn+"; title: "+title+"; author: "+author);
                    return new Book(isbn, title, author);
                })
                .collect(Collectors.toList());
    }

}
1 Additional imports required for reading in and accessing the grouped variables.
2 Defines a variable to capture the grouped configuration. The variable is a list of books. One book item will be created for each isbn that is defined.
3 getConfigKeysWithPrefix(PluginUtils.CONFIG_BOOK_PREFIX) selects just the book configuration keys, which we then iterate over.
4 Extracts the isbn from the key name and sets this as the isbn variable.
5 Sets the title, defaulting to Unknown title.
6 Sets the author, defaulting to Unknown author.

Accessing the configuration from your filter logic

As for the samples, the private variables can be accessed directly from the methods within your filter class.

The grouped variables can be accessed by iterating over the book list.

e.g. in the jsoup filter’s processDocument() method below.

[source,java]

    @Override
    public void processDocument(FilterContext filterContext) {
        if (debug) {
            log.debug("Debug mode is enabled");
            log.debug("Library: "+library);
        }

        Document doc = filterContext.getDocument();
        configEntries.forEach(e -> processMetadata(doc, e.getIsbn(), e.getTitle(), e.getAuthor(), library)); (1)
    }

    public void processMetadata(Document doc, String isbn, String title, String author, String library){
        // This code is run for each Book (grouped in config keys using the isbn)
        log.debug("Processing: isbn:"+isbn+" title: "+title+" author: "+author+" library: "+library); (2)

    }
1 Iterates over each of the books, passing the isbn, title and author for the book as well as the library variable.
2 This function implements whatever logic you want to the document for the set of rules you are processing. Here we just print a log message echoing back the variables.

Example - the full java files

Here are the complete java files from the example above.

Jsoup filter class - JsoupFilterReadConfigKeys.java
package com.example.pluginexamples;

import com.funnelback.common.filter.jsoup.FilterContext;
import com.funnelback.common.filter.jsoup.IJSoupFilter;
import com.funnelback.common.filter.jsoup.SetupContext;
import org.apache.logging.log4j.LogManager;
import org.apache.logging.log4j.Logger;
import org.jsoup.nodes.Document;

import java.util.List;
import java.util.Optional;
import java.util.stream.Collectors;

/**
* Demonstrates a jsoup plugin that reads some configuration options from the data source configuration.
*/
public class JsoupFilterReadConfigKeys implements IJSoupFilter {

    private static final Logger log = LogManager.getLogger(JsoupFilterReadConfigKeys.class);

    private SetupContext setup;
    private Boolean debug;
    private String library;
    private List<Book> configEntries;

    public void setup(SetupContext setup) {
        // This code is run when the filter chain is initialized
        this.setup = setup;

        // Read debug flag (DEF: false)
        this.debug = Boolean.parseBoolean(this.setup.getConfigSetting(PluginUtils.KEY_PREFIX+".debug"));

        this.library = Optional.ofNullable(this.setup.getConfigSetting(PluginUtils.KEY_PREFIX+".library")).orElse("British Library");

        // Check each collection.cfg setting and load the settings into an object
        configEntries = this.setup.getConfigKeysWithPrefix(PluginUtils.CONFIG_BOOK_PREFIX).stream()
                .map(key -> {
                    // Get the metadata field name from the config key by removing the key prefix.
                    String isbn = key.substring(PluginUtils.CONFIG_BOOK_PREFIX.length() + 1).replace(".title","").replace(".author","");
                    // Check if we're replacing delimiters in the name or other attribute (e.g. property or itemprop)
                    String title = Optional.ofNullable(this.setup.getConfigSetting(PluginUtils.CONFIG_BOOK_PREFIX + "." + isbn + ".title")).orElse("Unknown");
                    // this.setup.getConfigSetting(key) sets the delimiter
                    String author = Optional.ofNullable(this.setup.getConfigSetting(PluginUtils.CONFIG_BOOK_PREFIX + "." + isbn + ".author")).orElse("Unknown");
                    // this.setup.getConfigSetting(key) sets the delimiter
                    log.debug("isbn: "+isbn+"; title: "+title+"; author: "+author);
                    return new Book(isbn, title, author);
                })
                .collect(Collectors.toList());
    }

    @Override
    public void processDocument(FilterContext filterContext) {
        // This code is executed for each document
        log.info ("Processing document");

        if (debug) {
            log.debug("Debug mode is enabled");

            log.debug("Library: "+library);
        }

        Document doc = filterContext.getDocument();
        configEntries.forEach(e -> processMetadata(doc, e.getIsbn(), e.getTitle(), e.getAuthor(), library));
    }

    public void processMetadata(Document doc, String isbn, String title, String author, String library){
        // This code is run for each Book (grouped in config keys using the isbn)
        log.debug("Processing: isbn:"+isbn+" title: "+title+" author: "+author+" library: "+library);

    }
}
Plugin utils - PluginUtils.java
package com.example.pluginexamples;

public class PluginUtils {
    public static final String PLUGIN_NAME = "plugin-examples";
    public static final String KEY_PREFIX = "plugin." + PLUGIN_NAME + ".config." ;
    public static final String CONFIG_BOOK_PREFIX = KEY_PREFIX+".book";
}
Book helper class - Book.java
package com.example.pluginexamples;

import lombok.AllArgsConstructor;
import lombok.Getter;
import lombok.Setter;

@AllArgsConstructor
public class Book {
    /* Defines a configuration object to collection various configuration relating to a book */
    @Getter @Setter private String isbn;
    @Getter @Setter private String title;
    @Getter @Setter private String author;
}