Metadata

Metadata is information about a document. Metadata can come from a number of sources — it can be defined in HTML <meta> tags within a web page (such as description metadata), other HTML tags (such as the document’s <title> element), within document metadata attached to PDFs or Microsoft office documents or inferred from other information about an item (such as the size of the document in bytes, or the URL of the document).

In Funnelback metadata is also used to hold fielded information sourced from XML, CSV or databases.

Why use metadata?

The use of metadata in a web context has been historically of little value. This is partially due to abuse by spammers in the early days of the Internet.

As a general rule there is no benefit from including html metadata tags within a web page unless it is being written to be used for a specific purpose.

Funnelback provides a good reason to define metadata. Metadata can be used in a number of different ways by Funnelback to enhance your search results and provide users with a much richer search experience.

In general metadata can be used in the following ways:

  • To provide a fielded search.

  • To provide additional keyword data for the purposes of improving ranking.

  • To provide information that can be used for display purposes (such as for custom result summaries).

  • As a way of classifying documents, allowing for enhanced functionality (such as filtering search results by fields of particular values).

  • As additional information that Funnelback can for other features (such as to generate structured auto-completion).

Metadata can also be incorporated from an external source (such as a database) by producing external metadata.

Metadata best practices

  • Remove any and all default metadata mappings when implementing a new project. The default mappings should not be used.

  • Only map the metadata fields you will use either for display, sorting, faceting or ranking. Less is more.

  • Unless mapping dates (when you use ā€˜dā€™) do not use single alphanumeric fields, instead use fully qualified names.

  • Recommended to use camelCase naming convention, but be consistent no matter what you choose.

  • When mapping XML fields, ensure all fields are mapped and specify ensure the document content flag (treat as document content or display-only content) is set appropriately.

Metadata classes

Funnelback makes use of metadata and fielded information by mapping these to internal metadata classes, which are common across all types of documents.

See: Configuring metadata classes for information on using the search dashboard to configure data source metadata.