Read a file from a filter

Description

This document outlines how to read the content from file from a within a filter.

Example: standard document filter

In this example a configuration file (custom.mappings) contained within the collection’s configuration folder is read into a map which is then usable from within the filter.

For a non-Jsoup filter this is done within the filter’s constructor so that the map is initialised only once and the map is then available for each document that is filtered.

package com.funnelback.customfilter;

import java.util.*;
import org.junit.*;
import org.junit.Test;
import com.funnelback.filter.api.*;
import com.funnelback.filter.api.documents.*;
import com.funnelback.filter.api.filters.*;
import com.funnelback.filter.api.mock.*;
import com.google.common.collect.ListMultimap;
import static com.funnelback.filter.api.DocumentType.*;

/**
 * Opens a custom configuration file and reads some values into a map which can then be used while filtering.
 */

@groovy.util.logging.Log4j2
public class applyMapping implements StringDocumentFilter {

	//
    def mappings = [:]

	// Constructor, used to hold things that will be initialised once, but reused for each filtered document
    public applyMapping(File searchHome, String collectionName) {

        // Read the mappings and load these into the filter
        // The mappings are a key and value per line, tab delimited.
        def mFile = new File(searchHome.getAbsolutePath()+"/conf/"+collectionName+"/custom.mappings")
        mFile.readLines().each() {
            def m = it.split("\t")
            mappings[m[0]]=m[1]
        }
    }

    @Override
    public FilterResult filterAsStringDocument(StringDocument document, FilterContext context) throws RuntimeException, FilterException {

    	// Process the document
    	...

        return FilterResult.of(document);
    }
}

Example: Jsoup filter

In this example the same configuration file (custom.mappings) contained within the collection’s configuration folder is read into a map which is then usable from within the jsoup filter.

For a Jsoup filter this is done within the filter’s setup method so that the map is initialised only once and the map is then available for each document that is filtered.

@groovy.util.logging.Log4j2
public class CheckContent implements IJSoupFilter {

    // Object holding all the content filtering rules
    def mappings = [:]

    @Override
    public void setup(SetupContext context) {

	    // Read the mappings and load these into the filter
	    // The mappings are a key and value per line, tab delimited.
        def mFile = new File(context.getSearchHome().getAbsolutePath()+"/conf/"+context.getCollectionName()+"/custom.mappings")
	    mFile.readLines().each() {
	        def m = it.split("\t")
	        mappings[m[0]]=m[1]
	    }
    }

   @Override
   void processDocument(FilterContext context) {
   		// Do some filtering
   		...
	}
}