Plugin: Force XML
Purpose
This plugin forces all downloaded documents to be processed as XML.
Use this filter when you are indexing XML documents that are not correctly detected as XML. This usually happens when an XML document is returned with an incorrect MIME type (e.g. text/html) or the document is missing the XML declaration.
This plugin applies the following to all documents that are downloaded for the data source:
-
sets the document’s MIME type to
text/xmlwhich is required by filters and the indexer to process a document as XML. -
removes any leading whitespace from the document content as a valid XML document must begin with an XML declaration.
-
checks to see if the document starts with an XML declaration and adds one if required.
| This filter should only be used if all the files being processed are XML files. Use with other file types may result in unexpected behavior or filter errors. |
|
Usage
Enable the plugin
-
Select Plugins from the side navigation pane and click on the Force XML tile.
-
From the Location section, select the data source to which you would like to enable this plugin from the Select a data source select list.
| The plugin will take effect after setup steps and an advanced > full update of the data source has completed. |
| When configuring this plugin ensure that the Force XML filter is applied close to the start of your set of filters. For most use cases it should be the first filter that runs. |
Filter chain configuration
This plugin uses filters which are used to apply transformations to the gathered content.
The filters run in sequence and need be set in an order that makes sense. The plugin supplied filter(s) (as indicated in the listing) should be re-ordered to an appropriate point in the sequence.
| Changes to the filter order affects the way the data source processes gathered documents. See: document filters documentation. |