Plugin: Force XML

Purpose

This plugin forces all downloaded documents to be processed as XML.

Use this filter when you are indexing XML documents that are not correctly detected as XML. This usually happens when an XML document is returned with an incorrect MIME type (e.g. text/html) or the document is missing the XML declaration.

This plugin applies the following to all documents that are downloaded for the data source:

  • sets the document’s MIME type to text/xml which is required by filters and the indexer to process a document as XML.

  • removes any leading whitespace from the document content as a valid XML document must begin with an XML declaration.

  • checks to see if the document starts with an XML declaration and adds one if required.

This filter should only be used if all the files being processed are XML files. Use with other file types may result in unexpected behavior or filter errors.
  • This plugin supersedes the ForceXMLMime filter - if you have this filter set in your filter chain you should remove it as it is now redundant.

  • The indexer also includes an indexer option -forcexml that forces the indexer to process all documents as XML. This can be used as an alternative if no custom filtering is being performed that processes XML files.

Usage

Enable the plugin

  1. Select Plugins from the side navigation pane and click on the Force XML tile.

  2. From the Location section, select the data source to which you would like to enable this plugin from the Select a data source select list.

The plugin will take effect after setup steps and an advanced > full update of the data source has completed.
When configuring this plugin ensure that the Force XML filter is applied close to the start of your set of filters. For most use cases it should be the first filter that runs.

Filter chain configuration

This plugin uses filters which are used to apply transformations to the gathered content.

The filters run in sequence and need be set in an order that makes sense. The plugin supplied filter(s) (as indicated in the listing) should be re-ordered to an appropriate point in the sequence.

Changes to the filter order affects the way the data source processes gathered documents. See: document filters documentation.

Filter classes

This plugin supplies a filter that runs in the main document filter chain: com.funnelback.plugin.forcexml.ForceXmlStringFilter

Drag the com.funnelback.plugin.forcexml.ForceXmlStringFilter plugin filter to where you wish it to run in the filter chain sequence.

Examples

Change log