Define XML indexer settings using a plugin

The plugin indexing interface provides a set of methods that enables plugins to set the various XML indexer settings and is equivalent to setting the various fields on the data source’s XML indexing screen within the search dashboard.

The primary use case for this is to set up any XML indexer settings that are required to support a custom gatherer plugin.

Prerequisite

In order to set XML indexing configuration, your plugin must be configured to provide indexing functionality.

Adding XML indexing configuration

To set an XML indexing configuration, implement the xmlIndexingConfig() method within this java class.

XmlIndexingConfig xmlIndexingConfig(IndexConfigProviderContext context)

Within this class, you need to call one or more methods to set various aspects of the XML indexing configuration:

getDocumentPaths().add()

Adds an additional XPath to use for splitting of XML documents.

getUrlPaths().add()

Adds an additional XPath to use for an element that indicates a document’s URL.

getFileTypePaths().add()

Adds an additional XPath to use for an element that indicates a document’s file type.

getInnerDocumentPaths().add()

Adds an additional XPath to use for an inner document path.

getContentPaths().add()

Adds an additional XPath to use for indexable document content.

getWhenNoContentPathsAreSet()

Reads the flag indicating what to do when no content paths are set.

setWhenNoContentPathsAreSet()

Sets the flag indicating what to do when no content paths are set. Acceptable values are defined in the WhenNoContentPathsAreSet enum.

Example: Set XML indexing configuration

This example demonstrates how to set XML indexing configuration for a data source using a plugin.

ExampleIndexingConfigProvider.java
package com.funnelback.plugin.example;

import org.apache.logging.log4j.LogManager;
import org.apache.logging.log4j.Logger;

import com.funnelback.plugin.index.IndexConfigProviderContext;
import com.funnelback.plugin.index.IndexingConfigProvider;
import com.funnelback.plugin.index.model.indexingconfig.*;

public class ExampleIndexingConfigProvider implements IndexingConfigProvider {

    @Override
    public XmlIndexingConfig xmlIndexingConfig(IndexConfigProviderContext context) {

        XmlIndexingConfig xmlIndexingConfig = new XmlIndexingConfig(); (1)
        xmlIndexingConfig.getDocumentPaths().add(new DocumentPath("/items/item")); (2)
        xmlIndexingConfig.getFileTypePaths().add(new FileTypePath("/items/item/type")); (3)
        xmlIndexingConfig.getUrlPaths().add(new UrlPath("/items/item/url")); (4)

        // This plugin requires that all unmapped content is indexed as document content.
        xmlIndexingConfig.setWhenNoContentPathsAreSet(WhenNoContentPathsAreSet.DONT_INDEX_UNMAPPED_AS_CONTENT); (5)

        return xmlIndexingConfig;
    }

}
1 Creates a new XmlIndexingConfig object so we can set the configuration.
2 Sets the XPath which will be used to split the XML document that this plugin processes.
3 Sets the XPath of the element that indicates the document type for the item.
4 Sets the Xpath of the element containing the URL to use for the item.
5 Configures the indexer to ignore unmapped XPaths.

See also