Filter example - change document type

This example shows a simple filter plugin that attempts to detect an XML document by looking for an XML declaration, and ensuring that the correct MIME type is set.

The example below shows a simple filter implementation and corresponding tests.

Although this example implements a StringDocumentFilter, the other filter types: ByteDocumentFilter and Filter can also be used to change the document type.

If you need to update the encoding of a document’s content (e.g. because it was sent with incorrect information about the content encoding) you will need to use a bytes document filter as string document filters are always UTF-8 encoded.

Example

In this example we inspect the document content and set the document type to XML if the document looks like an XML document. This example implements the StringDocumentFilter. We are required to implement canFilter(), which in this example always returns ATTEMPT_FILTER as we must inspect the document content before we can decide if the document can be skipped or not. We are also required to implement filterAsStringDocument() which contains the logic for the filter.

DocumentFilterChangeDocumentType.java
package com.example.pluginexamples;

import org.apache.logging.log4j.LogManager;
import org.apache.logging.log4j.Logger;

import com.funnelback.filter.api.FilterContext;
import com.funnelback.filter.api.FilterResult;
import com.funnelback.filter.api.documents.NoContentDocument;
import com.funnelback.filter.api.documents.StringDocument;
import com.funnelback.filter.api.filters.PreFilterCheck;
import com.funnelback.filter.api.filters.StringDocumentFilter;
import com.funnelback.filter.api.DocumentType;

public class DocumentFilterChangeDocumentType implements StringDocumentFilter {

    private static final Logger log = LogManager.getLogger(DocumentFilterChangeDocumentType.class);

    @Override
    public PreFilterCheck canFilter(NoContentDocument noContentDocument, FilterContext filterContext) {
                return PreFilterCheck.ATTEMPT_FILTER; (1)
    }

    @Override
    public FilterResult filterAsStringDocument(StringDocument document, FilterContext filterContext) {
        log.debug("Assume documents which start with <?xml e.g. <?xml version=\\\"1.0\\\" encoding=\\\"UTF-8\\\"?>" +
                "are XML documents.");

        if (document.getContentAsString().trim().startsWith("<?xml ")) { (2)

            StringDocument filteredDocument = document.cloneWithStringContent(DocumentType.MIME_XML_TEXT, document.getContentAsString()); (3)

            return FilterResult.of(filteredDocument);
        }

        log.debug(document.getURI() + " does not appear to be a XML document.");

        return FilterResult.skipped(); (4)
    }
}
1 Always run this filter because we need to analyze the document to correct the content type.
2 Assume documents which start with <?xml e.g. <?xml version="1.0" encoding="UTF-8"?> are XML documents.
3 Change the document type to XML.
4 Return a filter skipped status so that a choice filter can try fixing the document with a different filter.
DocumentFilterChangeDocumentTypeTest.java
package com.example.pluginexamples;

import org.junit.Assert;
import org.junit.Test;

import com.funnelback.filter.api.DocumentType;
import com.funnelback.filter.api.FilterResult;
import com.funnelback.filter.api.documents.StringDocument;
import com.funnelback.filter.api.mock.MockDocuments;
import com.funnelback.filter.api.mock.MockFilterContext;

public class DocumentFilterChangeDocumentTypeTest {

    @Test
    public void fixXMLDocumentTypeTest() {
        // Create an input document where the content looks like XML but the document type is unknown

        StringDocument inputDoc = MockDocuments.mockEmptyStringDoc()
                .cloneWithStringContent(DocumentType.MIME_UNKNOWN,
                        "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n"
                                +"<foo>bar</foo>");
        FilterResult filterResult = new DocumentFilterChangeDocumentType().filter(inputDoc, MockFilterContext.getEmptyContext());

        Assert.assertFalse("Filter should not have been skipped", filterResult.isSkipped());

        StringDocument filteredDoc = (StringDocument) filterResult.getFilteredDocuments().get(0);

        Assert.assertTrue("Document type should have been changed to xml", filteredDoc.getDocumentType().isXML());
    }

    @Test
    public void skipsNonXMLDocumentTest() {
        //Create a document which does not look like XML

        StringDocument inputDoc = MockDocuments.mockEmptyStringDoc()
                .cloneWithStringContent(DocumentType.MIME_UNKNOWN,
                        "This doesn't look like XML!");
        FilterResult filterResult = new DocumentFilterChangeDocumentType().filter(inputDoc, MockFilterContext.getEmptyContext());

        Assert.assertTrue("Filter should have been skipped, as the document does not look like XML",
                filterResult.isSkipped());
    }
}