Funnelback supports search over numeric data stored in metadata fields. These fields can be defined in either:
Numeric fields can be queried using CGI parameters.
The CGI parameters are:
|CGI Parameter||Value Type||Description|
|lt_class||float||Performs a "Less than" operation on metadata class|
|le_class||float||Performs a "Less than or equals" operation on metadata class|
|gt_class||float||Performs a "Greater than" operation on metadata class|
|ge_class||float||Performs a "Greater than or equals" operation on metadata class|
|eq_class||float||Performs an "Equals" operation on metadata class|
|ne_class||float||Performs a "Not Equals" operation on metadata class|
The following assumptions are made by the indexer and query processor:
- a numeric field will not contain any characters other than whitespace before the numeric quantity.
- all numeric quantities are stored as an 8-byte double. It is assumed that this is sufficiently accurate.
- there is no understanding of the semantics of numeric quantities and no conversion of units. If the raw data mixes litres, cubic inches and cubic centimetres, the data will have to be converted prior to indexing.
gt_xoperators compare against the exact value specified. Other operators allow a small tolerance, enforced by the accuracy of 8-byte doubles.
How to index numerical data
The numerical range metadata can be represented in three different ways:
- via meta elements in HTML (or XML),
- via XML elements,
- via attributes of XML elements.
# Numerical metadata fields relating to cars weight,3,weight acceleration,3,acceleration capacity,3,engine_capacity price,3,price
PADRE XML Mapping Version: 2 #Supports numerical metadata either through elements or attributes. document,/car docurl,/car/url t,1,,//title description,0,,//description weight,3,,//weight acceleration,3,,//acceleration capacity,3,,//engine_capacity price,3,,//price weight,3,,/car@weight acceleration,3,,/car@acceleration capacity,3,,/car@engine_capacity price,3,,/car@price
No special settings are needed for indexing, but the appropriate query_processor_options (
-SF=<numeric metadata classes> and
-SM=both) will need to be set in collection.cfg to ensure that the numeric fields appear in the result packet. For the example above:
query_processor_options=-SM=meta -SF=[weight,capacity,acceleration,price] -SBL=2000
Example XML document which the above
xml.cfg applies to:
<p><car> <url>http://www.bmw.com.au/scripts/main.asp?PageID=11768&ModelID=1000079&ModelCategoryID=10 </url> <title>BMW model X95 </title> <meta name='description' content='The only BMW sports car with the ability to plough a field!'/> <weight>1056.9 <weight> <acceleration>30.9 <acceleration> <engine_capacity>5500 <engine_capacity> <price>165300 <price> </car> <car weight='1312.8' acceleration='15.2' engine_capacity='2293' price='65800'> <url>http://www.bmw.com.au/scripts/main.asp?PageID=11768&ModelID=1000116&ModelCategoryID=10&Screen=LaunchPage </url> <title>BMW model X100 </title> <meta name='description' content='The only BMW sports car which does not seem out of place when shopping for groceries.'/> </car> </p>
Composing a search
To find all the BMW cars costing less than or equal to one hundred thousand dollars with acceleration between 10 and 20, you would require a CGI query string as follows:
- This capability is not currently available via the
=operators in the query language (e.g.
- The CGI parameters currently work only as scoping operators. There must be a
queryto define a result set which is then scoped by
lt_xetc. If there is no
querythere will be no results.
- If the collection is part of a meta collection you must ensure that you configure the other collections to have the same numeric metadata class otherwise incompatible indexes will be produced. For example, if N is defined as type 3 metadata in one collection, all other collections that are part of the meta collection must also have the N field defined as type 3.