Plugin: Transform XML data

Purpose

Use this plugin if you need to apply a transformation to XML data before indexing.

Usage

Enable the plugin

  1. Select Plugins from the side navigation pane and click on the Transform XML data tile.

  2. From the Location section, select the data source to which you would like to enable this plugin from the Select a data source select list.

The plugin will take effect after setup steps and an advanced > full update of the data source has completed.

Configuration settings

The configuration settings section is where you do most of the configuration for your plugin. The settings enable you to control how the plugin behaves.

The configuration key names below are only used if you are configuring this plugin manually. The configuration keys are set in the data source configuration to configure the plugin. When setting the keys manually you need to type in (or copy and paste) the key name and value.

MIME type of transformed XML

Configuration key

plugin.transform-xml.config.transformedMimeType

Data type

string

Default value

XML

Allowed values

XML,JSON,HTML,Plain text,CSV

Required

This setting is optional

Sets the MIME type that will be associated with the XML document after the XSL Transformation has been applied.

e.g. if you are outputting XML then set this to 'text/xml', if you are converting to HTML then set this to 'text/html'. The output MIME type is important for downstream filters and also for how the document is indexed.

Outputting non-XML content

If you apply an XSLT that transforms your XML into another format then you should set the MIME type of transformed XML to the corresponding output format.

This is important because any chained filters rely on a correct document format being set in order to run, and it’s also important for the indexer, as HTML, XML and plain text documents are treated differently at index time.

Filter chain configuration

This plugin uses filters which are used to apply transformations to the gathered content.

The filters run in sequence and need be set in an order that makes sense. The plugin supplied filter(s) (as indicated in the listing) should be re-ordered to an appropriate point in the sequence.

Changes to the filter order affects the way the data source processes gathered documents. See: document filters documentation.

Filter classes

This plugin supplies a filter that runs in the main document filter chain: com.funnelback.plugin.transformXml.TransformXmlStringFilter

Drag the com.funnelback.plugin.transformXml.TransformXmlStringFilter plugin filter to where you wish it to run in the filter chain sequence.

Configuration files

This plugin also uses the following configuration files to provide additional configuration.

xml-transform.txt

Description

XSLT transformation template.

Configuration file format

xslt

Configuration file: XSLT transformation template

The plugin requires a configuration file named xml-transform.txt containing a valid XSL Transformation template to be uploaded as part of configuring the plugin. The plugin supports XSLT 1.0 - 3.0.

This template is applied to the XML records that are processed by the plugin with the resulting transformed XML being passed on to subsequent filters for further processing, or to be indexed.

This file can’t currently be created, viewed or edited from within the administration dashboard.

Updating your XSLT

If you wish to update the XSLT that is applied by the plugin you need to edit the configuration for your plugin and upload an updated copy of the xml-transform.txt file.

You should use a validator on your XSLT template before uploading, and test the template using an online XSLT tester utility. Funnelback will assume the template is valid. If the template doesn’t match anything in the source data then the original XML data is returned unmodified. An invalid template may result in your update failing.

Examples

Example: Transform XML

Consider the following XML file.

<?xml version="1.0"?>
<catalog>
    <book id="bk101">
        <author>Gambardella, Matthew</author>
        <title>XML Developer's Guide</title>
        <genre>Computer</genre>
        <price>44.95</price>
        <publish_date>2000-10-01</publish_date>
        <description>An in-depth look at creating applications with XML.</description>
    </book>
    <book id="bk102">
        <author>Ralls, Kim</author>
        <title>Midnight Rain</title>
        <genre>Fantasy</genre>
        <price>5.95</price>
        <publish_date>2000-12-16</publish_date>
        <description>A former architect battles corporate zombies, an evil sorceress, and her own childhood to become queen of the world.</description>
    </book>
    <book id="bk103">
        <author>Corets, Eva</author>
        <title>Maeve Ascendant</title>
        <genre>Fantasy</genre>
        <price>5.95</price>
        <publish_date>2000-11-17</publish_date>
        <description>After the collapse of a nanotechnology society in England, the young survivors lay the foundation for a new society.</description>
    </book>
</catalog>

Say you wish to transform the XML, moving the id attribute on book records into an <id> field and stripping out some other fields within each book record.

You could achieve this by applying the following XSLT template:

xml-transform.txt
<?xml version="1.0"?>

<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:template match="/">
<catalog>
<xsl:for-each select="catalog/book">
<book>
    <id><xsl:value-of select="@id"/></id>
    <title><xsl:value-of select="title"/></title>
    <author><xsl:value-of select="author"/></author>
</book>
</xsl:for-each>
</catalog>
</xsl:template>

</xsl:stylesheet>

This file must be named xml-transform.txt and uploaded to the XSLT transformation template option when configuring your plugin.

This results in the following XML being output:

<?xml version="1.0"?>
<catalog>
    <book>
        <id>bk101</id>
        <title>XML Developer's Guide</title>
        <author>Gambardella, Matthew</author>
    </book>
    <book>
        <id>bk102</id>
        <title>Midnight Rain</title>
        <author>Ralls, Kim</author>
    </book>
    <book>
        <id>bk103</id>
        <title>Maeve Ascendant</title>
        <author>Corets, Eva</author>
    </book>
</catalog>
In general, when you are working with XML, you will want to split your XML into individual records first using the Split XML and HTML plugin (note - using the XML splitting option on the XML index configuration screen will not work with the transformation filter. This will result in simpler XSLT templates being required.

Example: Split, then transform XML

Considering the same XML from the example above.

  1. Configure the Split HTML and XML filter to split the XML file on the /catalog/book XPath.

  2. Configure the XML transformation plugin so that the XML transformation filter runs after the Split XML or HTML filter (by dragging the filter order when configuring the plugin).

  3. When the update runs this will result in XML records that are similar to the following being produced by the XML split operation.

<?xml version="1.0"?>
<book id="bk102">
    <author>Ralls, Kim</author>
    <title>Midnight Rain</title>
    <genre>Fantasy</genre>
    <price>5.95</price>
    <publish_date>2000-12-16</publish_date>
    <description>A former architect battles corporate zombies, an evil sorceress, and her own childhood to become queen of the world.</description>
</book>

The XML transformation plugin can then be configured with the following XSLT template can be applied to achieve the same result as in the first example.

xml-transform.txt
<?xml version="1.0"?>

<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:template match="/book">
<book>
    <id><xsl:value-of select="@id"/></id>
    <title><xsl:value-of select="title"/></title>
    <author><xsl:value-of select="author"/></author>
</book>
</xsl:template>

</xsl:stylesheet>

Example: Transform XML into HTML

Consider the XML from the first example.

You can apply an XSL Transformation to reformat this as HTML:

<?xml version="1.0"?>

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
  <html>
  <body>
    <h2>Books</h2>
    <table>
      <tr>
        <th>Title</th>
        <th>Author</th>
      </tr>
      <xsl:for-each select="catalog/book">
        <tr>
          <td><xsl:value-of select="title"/></td>
          <td><xsl:value-of select="author"/></td>
        </tr>
      </xsl:for-each>
    </table>
  </body>
  </html>
</xsl:template>

</xsl:stylesheet>

This will output the XML as the following HTML:

<html>
   <body>
      <h2>Books</h2>
      <table>
         <tr>
            <th>Title</th>
            <th>Author</th>
         </tr>
         <tr>
            <td>XML Developer's Guide</td>
            <td>Gambardella, Matthew</td>
         </tr>
         <tr>
            <td>Midnight Rain</td>
            <td>Ralls, Kim</td>
         </tr>
         <tr>
            <td>Maeve Ascendant</td>
            <td>Corets, Eva</td>
         </tr>
      </table>
   </body>
</html>

When outputting as an alternate format, ensure you set the output format correctly, in this case to be HTML, so that it is processed correctly by chained filters and when indexing the document.

This is done by setting the plugin MIME type of transformed XML option.

Change log

[2.0.0]

Added

  • Added support for XSLT 2.0 and XSLT 3.0 templates.

Changed

  • Changed the Java libraries used to process the XML to Saxon. Existing XSLT 1.0 templates should continue to work although the transformed XML may contain slight differences in line breaks that are present in the output.