Filter example - change document type
This example shows a simple filter plugin that attempts to detect an XML document by looking for an XML declaration, and ensuring that the correct MIME type is set.
The example below shows a simple filter implementation and corresponding tests.
Although this example implements a StringDocumentFilter
, the other filter types: ByteDocumentFilter
and Filter
can also be used to change the document type.
If you need to update the encoding of a document’s content (e.g. because it was sent with incorrect information about the content encoding) you will need to use a bytes document filter as string document filters are always UTF-8 encoded. |
Example
In this example we inspect the document content and set the document type to XML if the document looks like an XML document. This example implements the StringDocumentFilter. We are required to implement canFilter(), which in this example always returns ATTEMPT_FILTER
as we must inspect the document content before we can decide if the document can be skipped or not. We are also required to implement filterAsStringDocument() which contains the logic for the filter.
DocumentFilterChangeDocumentType.java
package com.example.pluginexamples;
import org.apache.logging.log4j.LogManager;
import org.apache.logging.log4j.Logger;
import com.funnelback.filter.api.FilterContext;
import com.funnelback.filter.api.FilterResult;
import com.funnelback.filter.api.documents.NoContentDocument;
import com.funnelback.filter.api.documents.StringDocument;
import com.funnelback.filter.api.filters.PreFilterCheck;
import com.funnelback.filter.api.filters.StringDocumentFilter;
import com.funnelback.filter.api.DocumentType;
public class DocumentFilterChangeDocumentType implements StringDocumentFilter {
private static final Logger log = LogManager.getLogger(DocumentFilterChangeDocumentType.class);
@Override
public PreFilterCheck canFilter(NoContentDocument noContentDocument, FilterContext filterContext) {
return PreFilterCheck.ATTEMPT_FILTER; (1)
}
@Override
public FilterResult filterAsStringDocument(StringDocument document, FilterContext filterContext) {
log.debug("Assume documents which start with <?xml e.g. <?xml version=\\\"1.0\\\" encoding=\\\"UTF-8\\\"?>" +
"are XML documents.");
if (document.getContentAsString().trim().startsWith("<?xml ")) { (2)
StringDocument filteredDocument = document.cloneWithStringContent(DocumentType.MIME_XML_TEXT, document.getContentAsString()); (3)
return FilterResult.of(filteredDocument);
}
log.debug(document.getURI() + " does not appear to be a XML document.");
return FilterResult.skipped(); (4)
}
}
1 | Always run this filter because we need to analyze the document to correct the content type. |
2 | Assume documents which start with <?xml e.g. <?xml version="1.0" encoding="UTF-8"?> are XML documents. |
3 | Change the document type to XML. |
4 | Return a filter skipped status so that a choice filter can try fixing the document with a different filter. |
DocumentFilterChangeDocumentTypeTest.java
package com.example.pluginexamples;
import org.junit.Assert;
import org.junit.Test;
import com.funnelback.filter.api.DocumentType;
import com.funnelback.filter.api.FilterResult;
import com.funnelback.filter.api.documents.StringDocument;
import com.funnelback.filter.api.mock.MockDocuments;
import com.funnelback.filter.api.mock.MockFilterContext;
public class DocumentFilterChangeDocumentTypeTest {
@Test
public void fixXMLDocumentTypeTest() {
// Create an input document where the content looks like XML but the document type is unknown
StringDocument inputDoc = MockDocuments.mockEmptyStringDoc()
.cloneWithStringContent(DocumentType.MIME_UNKNOWN,
"<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n"
+"<foo>bar</foo>");
FilterResult filterResult = new DocumentFilterChangeDocumentType().filter(inputDoc, MockFilterContext.getEmptyContext());
Assert.assertFalse("Filter should not have been skipped", filterResult.isSkipped());
StringDocument filteredDoc = (StringDocument) filterResult.getFilteredDocuments().get(0);
Assert.assertTrue("Document type should have been changed to xml", filteredDoc.getDocumentType().isXML());
}
@Test
public void skipsNonXMLDocumentTest() {
//Create a document which does not look like XML
StringDocument inputDoc = MockDocuments.mockEmptyStringDoc()
.cloneWithStringContent(DocumentType.MIME_UNKNOWN,
"This doesn't look like XML!");
FilterResult filterResult = new DocumentFilterChangeDocumentType().filter(inputDoc, MockFilterContext.getEmptyContext());
Assert.assertTrue("Filter should have been skipped, as the document does not look like XML",
filterResult.isSkipped());
}
}