Metadata classes - multiple and alternative values
Introduction
Funnelback’s metadata classes support multiple and alternative metadata values.
-
multiple values is when a metadata class includes more than one fielded value such as a delimited HTML keywords (e.g. keywords=red,blue,green) meta field that contains many keywords or an XML record containing multiple keyword fields (e.g.
<kw>red</kw><kw>blue</kw><kw>green</kw>
). Each of the values is matched when a search is run. -
alternative values is when a metadata class includes several alternative values of which only one will be displayed e.g. a price field that contains the price in US dollars, Pounds Sterling, Euros, Australian dollars, and New Zealand dollars. Depending on the store or location only one of these alternative metadata values will shown in metadata summary.
Multiple metadata values
There are two ways in which a Funnelback metadata class will end up containing multiple values:
-
there are multiple metadata sources mapped to a single metadata class, and the source document matches more than one of the sources.
-
any of the matching metadata sources for a metadata class contain multiple values, either as delimited data within a single field, or as repeated fields. The delimiter is a vertical bar symbol (
|
) by default but this can be changed on a per-data source basis (the delimiters that are set will apply to all metadata classes for the purpose of splitting the into multiple values). The metadata delimiters plugin allows you to set the delimiter on a per-field basis. -
the metadata is returned from Funnelback within
listMetadata
in each result. The result’slistMetadata[metadataClass]
element holds all the metadata values of themetadataClass
field as a list of strings, which can be iterated across within the search template or from hook scripts.
Classes containing multiple values are important and useful because it allows for filtering (using faceted navigation) on the unique values contained within a metadata class. So for the second colors example below, a filter based on color could be attached to the search allowing the results to be filtered by color - and the record in the example would appear if blue, red or orange were applied in the filter.
Changing the delimiter
Setting the delimiter for individual fields
This applies only to HTML documents |
The metadata delimiters plugin allows you to set what character is used as a delimiter on a per-field basis (e.g. dc.subject
= ;
and keywords
=,
) for a data source.
This is preferable to using facet_item_sepchars (see below), which sets one or more delimiters that are applied to all fields within a data source.
|
Changing the field delimiter for all fields in a data source
The delimiter (|
) used to split an XML field can be changed by setting the -facet_item_sepchars
indexer option.
This option allows multiple (single character) delimiters to be specified (e.g. -facet_item_sepchars=|,;
will split all fields using a vertical bar, comma or semi-colon as the delimiter).
use this option with caution to avoid unintentionally splitting fields. e.g. setting the delimiter to a comma will often result in description fields being split in the middle of a sentence. |
Example: Multiple values from several fields.
A html document might include the following in the document header:
<title>Hamlet - the complete works of William Shakespeare</title>
<meta name="dc.title" content="Hamlet"/>
If both of these (<title>
and dc.title
) are listed as metadata sources for a class called docTitle the index will contain both the values delimited with a vertical bar:
docTitle: Hamlet - the complete works of William Shakespeare|Hamlet
Example: Multiple values from a single field.
Consider a html document containing
<meta name="colors" content="blue|red|orange">
or an equivalent XML document:
...
<colors>
<color>blue</color>
<color>red</color>
<color>orange</color>
<colors>
...
Assuming the default delimiter (|
) is being used and there are mappings for a HTML source color
or xml source //color
to a metadata class called productColors then the index will contain:
productColors: blue|red|orange
Alternative metadata values
If you need to update a metadata field containing variant values to be consistent this can be achieved with the metadata normalizer filter. |
Funnelback supports the storing of multiple alternative values within a single metadata string. When presenting values from a metadata string, a particular value can be selected by specifying a key, with fallback to a default value if the key is not present.
If a metadata field containing multiple alternative values is accessed without using the special options in the table below, the whole string will be used. To use the default value you must use the special options with a non-existent key, such as 'default'. |
Documents containing alternative metadata values should publish this metadata in the following form:
<meta name="FIELD" content="DEFAULT_VALUE;NUM_EXCEPTIONS;(KEY;VALUE)...(KEY;VALUE)" />
|
Querying selectable metadata
The selectable metadata mechanism can be controlled via CGI parameters:
CGI parameter | Values | Description |
---|---|---|
selector_class |
string |
specifies the key to use when accessing the given metadata class |
slt_class |
float |
Performs a "Less than" operation on metadata class, accessed by the key |
sle_class |
float |
Performs a "Less than or equals" operation on metadata class, accessed by the key |
sgt_class |
float |
Performs a "Greater than" operation on metadata class, accessed by the key |
sge_class |
float |
Performs a "Greater than or equals" operation on metadata class, accessed by the key |
seq_class |
float |
Performs an "Equals" operation on metadata class, accessed by the key |
Example 1: e-commerce
A large on-line retailer sells the same item for different prices, depending upon the location of the customer’s nearest store. Using Funnelback’s Selectable Metadata, only one document is needed for each item available for sale. In that document the price metadata is stored in the form of a string such as:
<meta name="price" content="4.10;5;(London;2.50)(Canberra;4.99)(Sydney;4.50)(Brisbane;4.63)(Szczecin;12.80)"/>
where 4.10 is the default price, 5 specifies that there are 5 exceptions, and the pairs of entries in parentheses show the prices which apply for the five different cities. When a person searches from a city, the city name can be inserted into the query string as a selector and the price shown and used in numerical range searches will be the one applicable to that city. For Melbourne, where no exception price is shown, the default price of 4.10 will be used.
Example 2: multi-lingual environment
An online collection for a Swiss museum contains images of artefacts along with applicable metadata. Some of the metadata is language independent (e.g. catalogue number) but other metadata such as the description of the artefact needs to exist in more than one language, for example:
<meta name="artefactDescription" content="Steinaxt;3;(FR;hache de pierre);(EN;stone axe)(IT;ascia di pietra)"/>
Where the default description is in German but alternatives are available for French, English and Italian.