Padre cooler ranking options

Background

This page describes the possible options for tuning the ranking using the cool query processor option. For more information about how raking works, see Funnelback ranking algorithms.

Those options can either be set in query processor options (collection.cfg) or using CGI parameters (e.g. ...&cool.2=12&cool.3=34...).

List of cooler options

Number Description

0

content: content weight

1

onlink: onsite link weight

2

offlink: offsite link weight

3

urllen: URL length weight

4

qie: external evidence (qie) weight

5

date_proximity: proximity to current date weight

6

urltype: URL attractiveness (Homepages favoured. Copyright pages and URLS with lots of punctuation deprecated.)

7

annie: annotation weight (annie)

8

domain_weight: weight associated with this domain

9

geoprox: geographical proximity to origin

10

nonbin: non-binariness (1 for html, xml, txt, 0 otherwise)

11

no_ads: freedom from ads

12

imp_phrase: implicit phrase match score

13

consistency: consistency of evidence. (Extra reward for docs with non-zero scores on both content and annie.)

14

log_annie: logarithm of annotation weight (log(annie))

15

anlog_annie: absolute-normalised logarithm of annotation weight.

16

annie_rank: annotation rank = (k - rank)/ k. where k = 2 x highest rank requested - if rank > k, rank = k

17

BM25F: field-weighted Okapi score

18

an_okapi: absolute-normalised Okapi score.

19

BM25F_rank: field-weighted Okapi rank.

20

mainhosts: bias in favour of principal servers (web search only).

21

comp_wt: component collection weighting. (meta collections only).

22

document_number: document number in the crawl. An early position in the crawl may correlate with importance

23

host_incoming_link_score

24

host_click_score

25

host_linking_hosts_score

26

host_linked_hosts_score

27

host_rank_in_crawl_order_score

28

host_domain_shallowness_score

29

doc_matches_regex: document matches administrator supplied regex

30

doc_does_not_match_regex: document does not match administrator supplied regex

31

titleWords: number of words in title

32

contentWords: number of indexed words in document

33

compressionFactor: compressibility of document text

34

entropy: entropy of document

35

stopwordFraction: fraction of stopwords in the document

36

stopwordCover: fraction of stopword list present in the document

37

averageTermLen: average term length

38

distinctWords: number of distinct words in the document

39

maxFreq: frequency of most frequently occurring term

40

titleWords_neg: Neg number of words in title

41

contentWords_neg: Neg number of indexed words in document

42

compressionFactor_neg: Neg compressibility of document text

43

entropy_neg: Neg entropy of document

44

stopwordFraction_neg: Neg fraction of stopwords in the document

45

stopwordCover_neg: Neg fraction of stopword list present in the document

46

averageTermLen_neg: Neg average term length

47

distinctWords_neg: Neg number of distinct words in the document

48

maxFreq_neg: Neg frequency of most frequently occurring term

49

titleWords_abs: Abs number of words in title

50

contentWords_abs: Abs number of indexed words in document

51

compressionFactor_abs: Abs compressibility of document text

52

entropy_abs: Abs entropy of document

53

stopwordFraction_abs: Abs fraction of stopwords in the document

54

stopwordCover_abs: Abs fraction of stopword list present in the document

55

averageTermLen_abs: Abs average term length

56

distinctWords_abs: Abs number of distinct words in the document

57

maxFreq_abs: Abs frequency of most frequently occurring term

58

titleWords_abs_neg: Abs number of words in title

59

contentWords_abs_neg: Neg abs number of indexed words in document

60

compressionFactor_abs_neg: Neg abs compressibility of document text

61

entropy_abs_neg: Neg abs entropy of document

62

stopwordFraction_abs_neg: Neg abs fraction of stopwords in the document

63

stopwordCover_abs_neg: Neg abs fraction of stopword list present in the document

64

averageTermLen_abs_neg: Neg abs average term length

65

distinctWords_abs_neg: Neg abs number of distinct words in the document

66

maxFreq_abs_neg: Neg abs frequency of most frequently occurring term

67

lexical_span_score

68

doc_matches_cgscope1: Documents which match gscope defined by -cgscope1 (if defined)

69

doc_matches_cgscope2: Documents which match gscope defined by -cgscope2 (if defined)

70

doc_does_not_match_cgscope1: Documents which do not match gscope defined by -cgscope1 (if defined)

71

doc_does_not_match_cgscope2: Documents which do not match gscope defined by -cgscope2 (if defined)

72

raw_annie: Untransformed annie score linealry scaled to 0..1

Values

Values are unbounded, but typical weights range from 0-100.

Example

To set the query processor to ignore URL length, but give a high weight to phrase matches implied by the query:

query_processor_options=-cool.3=0 -cool.12=100

© 2015- Squiz Pty Ltd