XML documents


Funnelback can index XML documents and there are some additional configuration files that are applicable to indexing XML files.

Example: Electronic books

Let's say you had a number of XML files representing electronic-books similar to:

  <title> The Adventures Of Sherlock Holmes </title>
  <author> Arthur Conan Doyle </author>

    <chapter>A Scandal in Bohemia</chapter>
    <chapter>The Red-headed League</chapter>
    <chapter>The Adventure of the Copper Beeches</chapter>

Because the data is plain XML files, it doesn't need any text conversion (like PDFs), so you could use a local collection.


To map this XML structure to metadata classes for the author (a), title (t) and chapters (x), create the xml.cfg file containing:


When this data is indexed, the text from these elements will be indexed and assigned to the specified metadata classes.


Because this is a local collection, there are a couple of configuration options that will help present the XML.

  1. Create the template.xsl script to convert the XML into HTML.
  2. Change the collection's search forms to use the cache_url instead of the live_url.

Crawling XML Files

To crawl XML files you will need to ensure that the crawler.parser.mimeTypes parameter includes text/xml as one of the MIME types the web crawler will accept.