Metadata classes - multiple and alternative values

Introduction

Funnelback’s metadata classes support multiple and alternative metadata values.

  • multiple values is when a metadata class includes more than one fielded value such as a delimited HTML keywords (e.g. keywords=red,blue,green) meta field that contains many keywords or an XML record containing multiple keyword fields (e.g. <kw>red</kw><kw>blue</kw><kw>green</kw>). Each of the values is matched when a search is run.

  • alternative values is when a metadata class includes several alternative values of which only one will be displayed e.g. a price field that contains the price in US dollars, Pounds Sterling, Euros, Australian dollars, and New Zealand dollars. Depending on the store or location only one of these alternative metadata values will shown in metadata summary.

Multiple metadata values

There are two ways in which a Funnelback metadata class will end up containing multiple values:

  • there are multiple metadata sources mapped to a single metadata class, and the source document matches more than one of the sources.

  • any of the matching metadata sources for a metadata class contain multiple values, either as delimited data within a single field, or as repeated fields. The delimiter is a vertical bar symbol (|) by default but this can be changed on a per-data source basis (the delimiters that are set will apply to all metadata classes for the purpose of splitting the into multiple values). The metadata delimiters plugin allows you to set the delimiter on a per-field basis.

  • the metadata is returned from Funnelback within listMetadata in each result. The result’s listMetadata[metadataClass] element holds all the metadata values of the metadataClass field as a list of strings, which can be iterated across within the search template or from hook scripts.

Classes containing multiple values are important and useful because it allows for filtering (using faceted navigation) on the unique values contained within a metadata class. So for the second colors example below, a filter based on color could be attached to the search allowing the results to be filtered by color - and the record in the example would appear if blue, red or orange were applied in the filter.

Changing the delimiter

Setting the delimiter for individual fields

This applies only to HTML documents

The metadata delimiters plugin allows you to set what character is used as a delimiter on a per-field basis (e.g. dc.subject = ; and keywords=,) for a data source.

This is preferable to using facet_item_sepchars (see below), which sets one or more delimiters that are applied to all fields within a data source.

Changing the field delimiter for all fields in a data source

The delimiter (|) used to split an XML field can be changed by setting the -facet_item_sepchars indexer option.

This option allows multiple (single character) delimiters to be specified (e.g. -facet_item_sepchars=|,; will split all fields using a vertical bar, comma or semi-colon as the delimiter).

use this option with caution to avoid unintentionally splitting fields. e.g. setting the delimiter to a comma will often result in description fields being split in the middle of a sentence.

Example: Multiple values from several fields.

A html document might include the following in the document header:

<title>Hamlet - the complete works of William Shakespeare</title>
<meta name="dc.title" content="Hamlet"/>

If both of these (<title> and dc.title) are listed as metadata sources for a class called docTitle the index will contain both the values delimited with a vertical bar:

docTitle: Hamlet - the complete works of William Shakespeare|Hamlet

Example: Multiple values from a single field.

Consider a html document containing

<meta name="colors" content="blue|red|orange">

or an equivalent XML document:

...
	<colors>
		<color>blue</color>
		<color>red</color>
		<color>orange</color>
	<colors>
...

Assuming the default delimiter (|) is being used and there are mappings for a HTML source color or xml source //color to a metadata class called productColors then the index will contain:

productColors: blue|red|orange

Alternative metadata values

If you need to update a metadata field containing variant values to be consistent this can be achieved with the metadata normalizer filter.

Funnelback supports the storing of multiple alternative values within a single metadata string. When presenting values from a metadata string, a particular value can be selected by specifying a key, with fallback to a default value if the key is not present.

If a metadata field containing multiple alternative values is accessed without using the special options in the table below, the whole string will be used. To use the default value you must use the special options with a non-existent key, such as 'default'.

Documents containing alternative metadata values should publish this metadata in the following form:

<meta name="FIELD" content="DEFAULT_VALUE;NUM_EXCEPTIONS;(KEY;VALUE)...(KEY;VALUE)" />
  • Keys may contain spaces and commas but not semicolons, double-quotes or parentheses.

  • Values may include semicolons, double-quotes and parentheses but only within double-quotes. To include a double-quote within a quoted part of a value, use double double-quotes. If a value is just double-quote, you will need to represent it using four consecutive double-quotes.

  • Semicolons are used to separate keys and values and also to terminate the default value and number of fields (currently ignored).

  • Values do not have to be numeric.

  • A maximum of ten fields may be made selectable.

  • Only one selector can be specified per field. In an e-commerce example, the price of Vegemite could be made to depend either on the size of the jar or on the store, but not both.

Querying selectable metadata

The selectable metadata mechanism can be controlled via CGI parameters:

CGI parameter Values Description

selector_class

string

specifies the key to use when accessing the given metadata class

slt_class

float

Performs a "Less than" operation on metadata class, accessed by the key

sle_class

float

Performs a "Less than or equals" operation on metadata class, accessed by the key

sgt_class

float

Performs a "Greater than" operation on metadata class, accessed by the key

sge_class

float

Performs a "Greater than or equals" operation on metadata class, accessed by the key

seq_class

float

Performs an "Equals" operation on metadata class, accessed by the key

Example search strings

Return items whose price in 'mystore' is no greater than 4.20:

&selector_price=mystore&sle_price=4.20

Display search results with French versions of the 'category' and 'description' metadata:

&SM=meta&SF=[description,category]&selector_category=FR&selector_description=fr

Example 1: e-commerce

A large on-line retailer sells the same item for different prices, depending upon the location of the customer’s nearest store. Using Funnelback’s Selectable Metadata, only one document is needed for each item available for sale. In that document the price metadata is stored in the form of a string such as:

<meta name="price" content="4.10;5;(London;2.50)(Canberra;4.99)(Sydney;4.50)(Brisbane;4.63)(Szczecin;12.80)"/>

where 4.10 is the default price, 5 specifies that there are 5 exceptions, and the pairs of entries in parentheses show the prices which apply for the five different cities. When a person searches from a city, the city name can be inserted into the query string as a selector and the price shown and used in numerical range searches will be the one applicable to that city. For Melbourne, where no exception price is shown, the default price of 4.10 will be used.

Example 2: multi-lingual environment

An online collection for a Swiss museum contains images of artefacts along with applicable metadata. Some of the metadata is language independent (e.g. catalogue number) but other metadata such as the description of the artefact needs to exist in more than one language, for example:

<meta name="artefactDescription" content="Steinaxt;3;(FR;hache de pierre);(EN;stone axe)(IT;ascia di pietra)"/>

Where the default description is in German but alternatives are available for French, English and Italian.