Metadata class types

Introduction

Funnelback supports five types of metadata classes:

  • Text: The content of this class is a string of text.

  • Number: The content of this field is a numeric value. Funnelback will interpret this as a number. This type should only be used if there is a need to use numeric operators when performing a search (e.g. X > 2050). If the field is only required for display within the search results a text type metadata class is sufficient.

  • Date: Funnelback supports a single date class and will use the values mapped to this class to determine a date for the document for the purpose of ranking, sorting and also date range search. If additional dates are required they should be configured as either text (e.g. 2017-09-24) or number (e.g. 20170924) type metadata classes.

  • Geospatial x/y coordinate: The content of this field is a decimal latlong value in the following format: geo-x;geo-y (e.g. 40.6976684;-74.260555) This type should only be used if there is a need to perform a geospatial search (e.g. This point is within X km of another point). If the geospatial coordinate is only required for plotting items on a map then a text type metadata class is sufficient.

  • Document permissions: The content of this field is a security lock string defining the document permissions. This type should only be used when working with an enterprise collection that includes document level security and specifies the requirement of a document permissions metadata field.

Metadata class types: text

A text type metadata class has the values interpreted as a text string.

The text can include code such as HTML tags and these will be returned as is by Funnelback. It is the responsibility of the user interface layer to interpret or escape the field content.

Metadata class types: date

Funnelback supports a single date-type metadata class using the reserved d metadata class. The value of this field is interpreted as a date and is assigned as the document’s date for the purposes of recency in the ranking algorithm, and also for sort and presentation.

Only a single date value will be assigned to the document. If multiple date metadata fields exist in the document the assigned date is chosen based on the date precedence rules below.

Supported date formats

Name Format Example Notes

RFC1123

See RFC1123 and RFC2822

Wed Mar 08 14:11:00 EST 2000

ISO-8601

YYYY-MM-DD

2001-01-31 or 2001-31-01 12:53:01Z or 2001-31-01T12:53:01Z

January 31st 2001

14 digits

YYYYMMDDHHmmss

20091110083016

November 10th 2009, 8:30:16 am

6 digits

YYMMDD

010131

January 31st 2001

Short ISO-8601

YYYY-MM

2001-01

January 2001

Very short ISO-8601

YYYY

2001

2001

Non compliant ISO-8601

YYYY-DD-MM

2001-31-01

Although this format is not standards compliant, dates with a middle component greater than 12 are treated this way. Take care though, ambiguous dates (e.g. May 2nd) will be interpreted in YYYY-MM-DD format.

Abbreviated date

YYMMMDD

31jan01

January 31st 2001

Long form date

DD MMMM YYYY

31st january, 2001 or 31 Jan 2001

Long or short form months accepted, punctuation and 'st' 'nd' optional - "31 January 2001" is also acceptable.

Long form date, month first

MMMM DD YYYY

January 31st, 2001

Long or short form months accepted, punctuation and 'st' 'nd' optional - "January 31 2001" is also acceptable.

Pre-2000 dates

DD MM YY

31/1/01 or 31-01-01

Punctuation ignored. The indexer interprets years less than 80 as post 2000, and years greater or equal to 80 as 1980 onwards. It is not recommended.

A TRIM format

DD/MM/YYYY at h:mm a

13/6/2007 at 6:51 AM, or 06/12/2007 at 4:51 PM

Used by TRIM record management system

Non-standard

DD-MM-YYYY

13-06-2007 or 13/06/2007

Avoid if possible

Non-standard

Day, DD Mon YYYY

Wed, 13 Jun 2007 17:26:08 +1000

At least there is no ambiguity here.

19 character UTC

yyyyMMddHHmmss.SSSZ

19970705071122.123Z

The indexer will convert this date from UTC to the server’s local time zone.

Notes:

  • All date formats are case insensitive.

  • There is no locale support for dates. Month names and abbreviations must be in English.

Date precedence order

When multiple dates are encountered for a document the following precedence order applies:

  1. External metadata (highest priority)

  2. The first occurrence in the document of dc.date or any metadata source mapped to the d metadata class.

  3. dc.date.modified

  4. dc.date.created

  5. dc.date.issued

  6. HTTP last modified date (lowest priority)

Metadata class types: number

Defining a metadata class as a number tells Funnelback to interpret the contents of the field as a number. This allows numeric comparisons (==, !=, >=, >, <, <=) to be run against the field, and for numeric ranges to be defined as faceted navigation using the class.

Numeric metadata is only required if you wish to make use of these range comparisons or for numeric range facets. Numbers for the purpose of display in the search results should be defined as text metadata.

The value of a numeric field will contain an integer or float, and this number is interpreted by Funnelback as an 8-byte double. This affects the precision of large and small numerical values when applying range searches against a specific number. The lt_x and gt_x operators compare against the exact value specified. Other operators allow a small tolerance, enforced by the accuracy of 8-byte doubles.

Metadata class types: geospatial x/y coordinate

Defining a field as geospatial type metadata tells Funnelback to interpret the contents of the field as a decimal lat/long coordinate. (e.g. -31.95516;115.85766). This is used by Funnelback to assign a geospatial coordinate to an indexed item (effectively pinning it to a single point on a map). A geospatial metadata field is useful if you wish to add any location-based search constraints such as show me items within a specified distance to a specified origin point, or sort the results by proximity (closeness) to a specific point.

A geospatial metadata coordinate is not required if you just want to plot the item onto a map in the search results (a text type value will be fine as it’s just a text value you are passing to the mapping API service that will generate the map). It is only required if you wish to make use of the distance related searching (e.g. find results near this location).

Metadata class types: document permissions

Funnelback interprets the value contained in a document permissions type metadata class as a document lock string describing the access controls that apply to the document.

This is used for enterprise search collections that enforce document level security.

The format of the lockstring is determined by the connector that is used for the repository that is being indexed.

Defining a document permissions type metadata field will prevent all results from the index from being returned unless an appropriate security plugin has been defined. This is to enforce a miniminum level of security over the collection when document level security is enabled. For this reason metadata fields of this type should only be defined when indexing a supported repository type that requires a document permissions metadata field to be defined.

See: document level security for further information.

Searching metadata

Metadata can be searched via the Funnelback query language or metadata specific CGI parameters.

See: Funnelback query language help for further information.

© 2015- Squiz Pty Ltd