Plugin: Transform XML data
Usage
Enable the plugin
-
Select Plugins from the side navigation pane and click on the Transform XML data tile.
-
From the Location section, select the data source to which you would like to enable this plugin from the Select a data source select list.
The plugin will take effect after setup steps and an advanced > full update of the data source has completed. |
Configuration settings
The configuration settings section is where you do most of the configuration for your plugin. The settings enable you to control how the plugin behaves.
The configuration key names below are only used if you are configuring this plugin manually. The configuration keys are set in the data source configuration to configure the plugin. When setting the keys manually you need to type in (or copy and paste) the key name and value. |
MIME type of transformed XML
Configuration key |
|
Data type |
string |
Default value |
|
Allowed values |
XML,JSON,HTML,Plain text,CSV |
Required |
This setting is optional |
Sets the MIME type that will be associated with the XML document after the XSL Transformation has been applied.
e.g. if you are outputting XML then set this to 'text/xml', if you are converting to HTML then set this to 'text/html'. The output MIME type is important for downstream filters and also for how the document is indexed.
Outputting non-XML content
If you apply an XSLT that transforms your XML into another format then you should set the MIME type of transformed XML to the corresponding output format.
This is important because any chained filters rely on a correct document format being set in order to run, and it’s also important for the indexer, as HTML, XML and plain text documents are treated differently at index time.
Filter chain configuration
This plugin uses filters which are used to apply transformations to the gathered content.
The filters run in sequence and need be set in an order that makes sense. The plugin supplied filter(s) (as indicated in the listing) should be re-ordered to an appropriate point in the sequence.
Changes to the filter order affects the way the data source processes gathered documents. See: document filters documentation. |
Configuration files
This plugin also uses the following configuration files to provide additional configuration.
Configuration file: XSLT transformation template
The plugin requires a configuration file named xml-transform.txt
containing a valid XSL Transformation template to be uploaded as part of configuring the plugin. The plugin supports XSLT 1.0 - 3.0.
This template is applied to the XML records that are processed by the plugin with the resulting transformed XML being passed on to subsequent filters for further processing, or to be indexed.
This file can’t currently be created, viewed or edited from within the administration dashboard. |
Updating your XSLT
If you wish to update the XSLT that is applied by the plugin you need to edit the configuration for your plugin and upload an updated copy of the xml-transform.txt
file.
You should use a validator on your XSLT template before uploading, and test the template using an online XSLT tester utility. Funnelback will assume the template is valid. If the template doesn’t match anything in the source data then the original XML data is returned unmodified. An invalid template may result in your update failing. |
Example: Transform XML
Consider the following XML file.
<?xml version="1.0"?>
<catalog>
<book id="bk101">
<author>Gambardella, Matthew</author>
<title>XML Developer's Guide</title>
<genre>Computer</genre>
<price>44.95</price>
<publish_date>2000-10-01</publish_date>
<description>An in-depth look at creating applications with XML.</description>
</book>
<book id="bk102">
<author>Ralls, Kim</author>
<title>Midnight Rain</title>
<genre>Fantasy</genre>
<price>5.95</price>
<publish_date>2000-12-16</publish_date>
<description>A former architect battles corporate zombies, an evil sorceress, and her own childhood to become queen of the world.</description>
</book>
<book id="bk103">
<author>Corets, Eva</author>
<title>Maeve Ascendant</title>
<genre>Fantasy</genre>
<price>5.95</price>
<publish_date>2000-11-17</publish_date>
<description>After the collapse of a nanotechnology society in England, the young survivors lay the foundation for a new society.</description>
</book>
</catalog>
Say you wish to transform the XML, moving the id attribute on book records into an <id>
field and stripping out some other fields within each book record.
You could achieve this by applying the following XSLT template:
<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<catalog>
<xsl:for-each select="catalog/book">
<book>
<id><xsl:value-of select="@id"/></id>
<title><xsl:value-of select="title"/></title>
<author><xsl:value-of select="author"/></author>
</book>
</xsl:for-each>
</catalog>
</xsl:template>
</xsl:stylesheet>
This file must be named xml-transform.txt
and uploaded to the XSLT transformation template option when configuring your plugin.
This results in the following XML being output:
<?xml version="1.0"?>
<catalog>
<book>
<id>bk101</id>
<title>XML Developer's Guide</title>
<author>Gambardella, Matthew</author>
</book>
<book>
<id>bk102</id>
<title>Midnight Rain</title>
<author>Ralls, Kim</author>
</book>
<book>
<id>bk103</id>
<title>Maeve Ascendant</title>
<author>Corets, Eva</author>
</book>
</catalog>
In general, when you are working with XML, you will want to split your XML into individual records first using the Split XML and HTML plugin (note - using the XML splitting option on the XML index configuration screen will not work with the transformation filter. This will result in simpler XSLT templates being required. |
Example: Split, then transform XML
Considering the same XML from the example above.
-
Configure the Split HTML and XML filter to split the XML file on the
/catalog/book
XPath. -
Configure the XML transformation plugin so that the XML transformation filter runs after the Split XML or HTML filter (by dragging the filter order when configuring the plugin).
-
When the update runs this will result in XML records that are similar to the following being produced by the XML split operation.
<?xml version="1.0"?>
<book id="bk102">
<author>Ralls, Kim</author>
<title>Midnight Rain</title>
<genre>Fantasy</genre>
<price>5.95</price>
<publish_date>2000-12-16</publish_date>
<description>A former architect battles corporate zombies, an evil sorceress, and her own childhood to become queen of the world.</description>
</book>
The XML transformation plugin can then be configured with the following XSLT template can be applied to achieve the same result as in the first example.
<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/book">
<book>
<id><xsl:value-of select="@id"/></id>
<title><xsl:value-of select="title"/></title>
<author><xsl:value-of select="author"/></author>
</book>
</xsl:template>
</xsl:stylesheet>
Example: Transform XML into HTML
Consider the XML from the first example.
You can apply an XSL Transformation to reformat this as HTML:
<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<html>
<body>
<h2>Books</h2>
<table>
<tr>
<th>Title</th>
<th>Author</th>
</tr>
<xsl:for-each select="catalog/book">
<tr>
<td><xsl:value-of select="title"/></td>
<td><xsl:value-of select="author"/></td>
</tr>
</xsl:for-each>
</table>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
This will output the XML as the following HTML:
<html>
<body>
<h2>Books</h2>
<table>
<tr>
<th>Title</th>
<th>Author</th>
</tr>
<tr>
<td>XML Developer's Guide</td>
<td>Gambardella, Matthew</td>
</tr>
<tr>
<td>Midnight Rain</td>
<td>Ralls, Kim</td>
</tr>
<tr>
<td>Maeve Ascendant</td>
<td>Corets, Eva</td>
</tr>
</table>
</body>
</html>
When outputting as an alternate format, ensure you set the output format correctly, in this case to be HTML, so that it is processed correctly by chained filters and when indexing the document.
This is done by setting the plugin MIME type of transformed XML option.