Plugin: Wrap XML element in html tags
Other versions of this plugin may exist. Please ensure you are viewing the documentation for the version that you are currently using. If you are not running the latest version of your plugin we recommend upgrading. See: list of all available versions of this plugin. |
Purpose
This plugin provides users with the ability to wrap specific XML elements in <html>...<html>
tags, so that they can be indexed by PADRE as inner HTML documents. This is useful when an XML or JSON feed provided by a client contains nested HTML.
When to use this plugin
This plugin should be used if you are trying to index an XML file using the HTML inner document mode, if the XML field you are attempting to index contains bare HTML code (without surrounding <html>
tags). If you attempt to index this without the plugin padre will fail to index the inner document as HTML.
Usage
Enable the plugin
Enable the xml-element-html-wrapper-filter plugin on your data source from the Extensions screen in the administration dashboard or add the following data source configuration to enable the plugin.
plugin.xml-element-html-wrapper-filter.enabled=true
plugin.xml-element-html-wrapper-filter.version=1.0.0
This plugin requires a full update of the data source to take effect. |
Plugin configuration settings
The XmlElementHtmlWrapperFilter
filter must be added to the filter chain for the plugin to work correctly:
Add the filter to the filter.classes
in the data source configuration.
filter.classes=<OTHER-FILTERS>:com.funnelback.plugins.xmlElementHtmlWrapperFilter.XmlElementHtmlWrapperFilter:<OTHER-FILTERS>
The filter should be placed at an appropriate position in the filter chain. In most circumstances this should be located towards the end of the filter chain. |
The following option must be set in the data source configuration to configure the plugin:
-
plugin.xml-element-html-wrapper-filter.config.xpath=/xpath/to/html/to/wrap
: Defines the XPath of the XML element containing HTML that should have its content wrapped in<html>...</html>
tags.
When defining the X-Path, ensure you take into account any changes in the XML structure that might have been introduced in previous filters. For example if you split the XML file into individual XML documents that you then filter, the X-Paths will need to be adjusted to reflect the individual XML record structure. |
Example
Consider the following XML document:
<files>
<file>
<title>Example title 1</title>
<description>This is an example document</description>
<url>http://example.com/example-files/file.html</url>
<doc><p>An example HTML document.</p>></doc>
</file>
<file>
<title>Example title 2</title>
<description>This is another example document</description>
<url>http://example.com/example-files/file2.html</url>
<doc><![CDATA[<p>Another example HTML document.</p>]]></doc>
</file>
</files>
The doc
element contains html code that you wish to index as a html document.
Configuring the plugin with:
plugin.xml-element-html-wrapper-filter.config.xpath=/files/file/doc
will result in the following XML file being produced for Padre to index.
<files>
<file>
<title>Example title 1</title>
<description>This is an example document</description>
<url>http://example.com/example-files/file.html</url>
<doc><html><p>An example HTML document.</p></html></doc>
</file>
<file>
<title>Example title 2</title>
<description>This is another example document</description>
<url>http://example.com/example-files/file2.html</url>
<doc><html><p>Another example HTML document.</p></html></doc>
</file>
</files>
Caveats
-
Applying this plugin can have unintended consequences when used in conjunction with the Inner HTML or XML documents in the Funnelback special XML configuration. If the
Inner HTML or XML documents
feature is used, the field containing the Document URL must be above all fields containing inner HTML or XML documents.