Plugin: Wrap XML element in HTML tags

Purpose

Use this plugin when you need to index an XML field containing HTML as a HTML inner document, and it isn’t correctly detected as HTML.

This plugin provides users with the ability to wrap a specific XML field contents in <html> tags, so that the field can be indexed as an inner HTML document. This is useful when an XML or JSON feed containing HTML can’t be modified at the source.

When to use this plugin

This plugin should be used if you are trying to index an XML file using the HTML inner document mode, and the XML field you are attempting to index contains a HTML fragment (without surrounding <html> tags). If you attempt to index this without the plugin the field contents will not be correctly detected as HTML.

Caveats

  • Applying this plugin can have unintended consequences when used in conjunction with the Inner HTML or XML documents in the Funnelback special XML configuration. If the inner HTML or XML documents feature is used, the field containing the document URL must be above all fields containing inner HTML or XML documents.

Usage

Enable the plugin

  1. Select Plugins from the side navigation pane and click on the Wrap XML element in HTML tags tile.

  2. From the Location section, select the data source to which you would like to enable this plugin from the Select a data source select list.

The plugin will take effect after setup steps and an advanced > full update of the data source has completed.

Configuration settings

The configuration settings section is where you do most of the configuration for your plugin. The settings enable you to control how the plugin behaves.

The configuration key names below are only used if you are configuring this plugin manually. The configuration keys are set in the data source configuration to configure the plugin. When setting the keys manually you need to type in (or copy and paste) the key name and value.

XML element to wrap in HTML tags

Configuration key

plugin.xml-element-html-wrapper-filter.config.xpath

Data type

string

Required

This setting is required

Specifies an XPath to the XML element containing the HTML fragment that should have its content wrapped in '<html>' tags.

When defining the X-Path, ensure you take into account any changes in the XML structure that might have been introduced in previous filters. For example if you split the XML file into individual XML documents that you then filter, the X-Paths will need to be adjusted to reflect the individual XML record structure.

Filter chain configuration

This plugin uses filters which are used to apply transformations to the gathered content.

The filters run in sequence and need be set in an order that makes sense. The plugin supplied filter(s) (as indicated in the listing) should be re-ordered to an appropriate point in the sequence.

Changes to the filter order affects the way the data source processes gathered documents. See: document filters documentation.

Filter classes

This plugin supplies a filter that runs in the main document filter chain: com.funnelback.plugins.xmlElementHtmlWrapperFilter.XmlElementHtmlWrapperFilter

Drag the com.funnelback.plugins.xmlElementHtmlWrapperFilter.XmlElementHtmlWrapperFilter plugin filter to where you wish it to run in the filter chain sequence.

Examples

Consider the following XML document:

<files>
  <file>
    <title>Example title 1</title>
    <description>This is an example document</description>
    <url>http://example.com/example-files/file.html</url>
    <doc>&lt;p&gt;An example HTML document.&lt;/p&gt;></doc>
  </file>
  <file>
    <title>Example title 2</title>
    <description>This is another example document</description>
    <url>http://example.com/example-files/file2.html</url>
    <doc><![CDATA[<p>Another example HTML document.</p>]]></doc>
  </file>
</files>

The doc element contains html code that you wish to index as an HTML inner document.

Configuring the plugin with:

Configuration key name Value

XML element to wrap in HTML tags

/files/file/doc

will result in the following modification to the downloaded XML, making it suitable for indexing.

<files>
  <file>
    <title>Example title 1</title>
    <description>This is an example document</description>
    <url>http://example.com/example-files/file.html</url>
    <doc>&lt;html&gt;&lt;p&gt;An example HTML document.&lt;/p&gt;&lt;/html&gt;</doc>
  </file>
  <file>
    <title>Example title 2</title>
    <description>This is another example document</description>
    <url>http://example.com/example-files/file2.html</url>
    <doc>&lt;html&gt;&lt;p&gt;Another example HTML document.&lt;/p&gt;&lt;/html&gt;</doc>
  </file>
</files>

Change log

[1.1.0]

Changed

  • Updated to the latest version plugin framework (Funnelback shared v16.20) to enable integration with the new plugin management dashboard.