Configuring metadata

Metadata is configured by creating a series of metadata classes to which metadata sources such as HTML metadata field names, HTTP headers and XML XPaths are mapped.

The mappings are configured using the search dashboard by first selecting the collection on which to set up the metadata for, then selecting customize metadata mappings. This will take you to the metadata configuration screen where you can define the metadata classes and the source field mappings.

Managing metadata classes

The metadata management screen is accessed by selecting configure metadata mappings from the settings panel on the data source details screen.

The metadata management screen will load and display any existing metadata mappings, sorted by the metadata class name.

Choosing an editing context

Most collections typically contain metadata supplied either by HTTP Headers and HTML content or by XML elements, but not both. Before managing the metadata mappings, it might be helpful to select an editing context. This will pre-select the appropriate filters and set a focus on one type of data source.

  • HTTP headers, HTML tags or metadata fields: Only classes containing sources which are HTTP headers, HTML tags or metadata fields will be shown.

  • XML XPaths: Only classes containing XML based sources will be shown.

Managing the mappings

Existing mappings can be edited by clicking on the entry and new mappings can be added by clicking on the add new button.

metadata-admin-2.png

Filter controls are provided to show only classes containing HTML or XML mappings, and also to filter the class and source names by keyword.

The tools menu provides the ability to clear all existing metadata mappings.

The table shows summary information for each entry:

  • the metadata class name

  • the list of sources that are mapped to the entry

  • the type of the metadata class and the search behaviour if applies

  • the number of documents that currently match this mapping in the index. Note: this count is based on the current index and changes to the metadata configuration won’t be reflected in the counts until the index is rebuilt.

  • comments entered about the class.

Hovering over the summary exposes three additional options:

  • preview: shows three matching items from the index and the metadata field content. This will be unavailable if the collection has not yet been indexed.

  • edit: opens the editing screen. This is the same as clicking anywhere on the row in the table.

  • delete: allows the metadata class definition to be deleted after confirmation.

Add or edit a new metadata class

To add a new metadata class click the add new button. Edit an existing source by clicking on the corresponding entry in the table.

metadata-admin-3.png

At a minimum a metadata class requires a unique metadata class name, a type and at least one metadata source to be mapped to the class. The following fields are available when configuring metadata:

  • Class name is a unique ASCII alphanumeric string up to 64 characters long. This is used to identify the metadata class. New collections will have some predefined mappings, and Funnelback has some reserved classes.

  • Type indicates how Funnelback will interpret the metadata. Five metadata class types are available.

  • Search behaviour (text type metadata only). The search behaviour defines if a text type metadata will be used for display purposes only, or contribute to the document’s content.

  • Sources lists all the HTML and XML sources that will be mapped to this metadata class if they exist within the document being indexed. Existing sources can be edited, and new sources added by clicking the add new button. The metadata sources can also be filtered by source name.

  • Comments allows the administrator to add an annotation to the metadata class.

Once these fields are populated the metadata class can be saved.

Add or edit a metadata source to an existing metadata class

Clicking the add new button in the sources panel opens a screen that allows metadata sources to be added to the metadata class.

metadata-admin-4.png

The source editor screen consists of the following fields:

  • Source name: Depending on the type of source selected this field will be identified as header or name, tag or XPath. Typing into this field will filter the list of suggestions in the main panel. Custom field names can also be typed into this field and then added from the main panel. Custom names are required if you wish to set up a field mapping before an indexing run has completed, or you with to set up an un-anchored XPath (as only absolute XPaths are suggested).

  • Type of source: filters the list of suggestions displayed in the main panel to only the type that is selected. Several source types are supported by Funnelback. These types consist of HTML metadata field names, HTML tags, HTTP headers and XML XPaths. Not to be confused with the metadata class type which determines how Funnelback interprets the content.

  • Suggestions: The suggestions panel displays metadata sources detected during indexing (from HTML metadata fields, HTTP headers and XPaths) as well as an indication of the frequency of the field in the content that is being indexed. Common HTML tags are also suggested, however these are not discovered by the indexer so may not match anything in the index.

Sources are added by checking their entries in the suggestions panel and clicking save.

The suggestions list can also be filtered to show internal Funnelback sources (such as X-Funnelback-* HTTP headers and FUN* metadata fields generated for content auditing) and to hide items that are not selected.