Funnelback logo

Documentation

CATEGORY

Numerical Metadata

Introduction

Funnelback supports search over numeric data stored in metadata fields. These fields can be defined in either:

Numeric fields can be queried using CGI parameters.

The CGI parameters are:

CGI Parameter Values Description
lt_<x> where <x> is a metadata class <float> Performs a "Less than" operation on metadata class <x>
le_<x> where <x> is a metadata class <float> Performs a "Less than or equals" operation on metadata class <x>
gt_<x> where <x> is a metadata class <float> Performs a "Greater than" operation on metadata class <x>
ge_<x> where <x> is a metadata class <float> Performs a "Greater than or equals" operation on metadata class <x>
eq_<x> where <x> is a metadata class <float> Performs an "Equals" operation on metadata class <x>
ne_<x> where <x> is a metadata class <float> Performs a "Not Equals" operation on metadata class <x>

Assumptions

The following assumptions are made by the PADRE indexer and query processor:

  • PADRE assumes that a numeric field will not contain any characters other than whitespace before the numeric quantity.
  • PADRE stores all numeric quantities as an 8-byte double. It is assumed that this is sufficiently accurate.
  • PADRE doesn't currently have any understanding of the semantics of numeric quantities and does no conversion of units. If the raw data mixes litres, cubic inches and cubic centimetres, the data will have to be converted prior to indexing with padre-iw.
  • The lt_x and gt_x operators compare against the exact value specified. Other operators allow a small tolerance, enforced by the accuracy of 8-byte doubles.

How to index numerical data

The numerical range metadata can be represented in three different ways:

  1. via meta elements in HTML (or XML),
  2. via XML elements,
  3. via attributes of XML elements.

Example

Example metamap.cfg

# Numerical metadata fields relating to cars
W,3,weight
A,3,acceleration
C,3,engine_capacity
P,3,price

Example xml.cfg

PADRE XML Mapping Version: 2
#Supports numerical metadata either through elements or attributes.
document,/car
docurl,/car/url
t,1,,//title
c,0,,//description
W,3,,//weight
A,3,,//acceleration
C,3,,//engine_capacity
P,3,,//price
W,3,,/car@weight 
A,3,,/car@acceleration 
C,3,,/car@engine_capacity 
P,3,,/car@price

No special settings are needed for indexing, but the appropriate query_processor_options (-SF=<numeric metadata classes> and -SM=both) will need to be set in collection.cfg to ensure that the numeric fields appear in the padre-sw result packet. For the example above:

query_processor_options=-SM=meta -SF=WCAP -SBL=2000

Example XML document which the above xml.cfg applies to:


<car>
  <url>http://www.bmw.com.au/scripts/main.asp?PageID=11768&ModelID=1000079&ModelCategoryID=10</url>
  <title>BMW model X95</title>
  <meta name='description' content='The only BMW sports car with the ability to plough a field!'/>
  <weight>1056.9<weight>
  <acceleration>30.9<acceleration>
  <engine_capacity>5500<engine_capacity>
  <price>165300<price>
</car>

<car weight='1312.8' acceleration='15.2' engine_capacity='2293' price='65800'> 
  <url>http://www.bmw.com.au/scripts/main.asp?PageID=11768&ModelID=1000116&ModelCategoryID=10&Screen=LaunchPage</url>
  <title>BMW model X100</title>
  <meta name='description' content='The only BMW sports car which does not seem out of place when shopping for groceries.'/>
</car>

Composing a search

To find all the BMW cars costing less than or equal to one hundred thousand dollars with acceleration between 10 and 20, you would require a CGI query string as follows:

query=BMW&le_P=100000&ge_A=10&le_A=20

Caveats

  • This capability is not currently available via the >, < and = operators in the query language.
  • The CGI parameters currently work only as scoping operators. There must be a query to define a results set which is then scoped by lt_x etc. If there is no query there will be no results.
  • If the collection is part of a meta collection you must ensure that you configure the other collections to have the same numeric metadata class otherwise incompatible indexes will be produced. For example if N is defined as type 3 metadata in one collection all other collections that are part of the meta collection must also have the N field defined as type 3.

See Also

top ⇑