Stemming
Stemming is the process of reducing words to a common stem and allowing the search to match different variants of the word based on the common word stem.
The default Funnelback configuration supports automatic stemming so that queries match closely related words e.g. "parties" may also match "party". However, stemming may sometimes harm retrieval effectiveness e.g. returning documents containing "Hawk" or "Hawkins" for the query "Hawking".
The stemming is controlled with the query processor option -stem
.
Light stemming
This is the default for Funnelback. Light stemming stems words to singular and plural forms of the same word. Support is provided for English and French words. E.g. dog/dogs, worry/worries.
Light stemming is applied by setting the query processor option:
query_processor_options= -stem=2
Heavy stemming
Heavier stemming designed as a limited extension to cover subject/professional matching - science/scientist, biology/biologist. It does not do stemming of participles, so bullying will not be considered equivalent to bully, in the same way that Hawking is not equivalent to Hawk or Hawks.
Heavy stemming is applied by setting the query processor option:
query_processor_options= -stem=3
Disable stemming
Stemming is disabled by setting the query processor option:
query_processor_options= -stem=0
-stem=1 is a discontinued option and has the same effect as setting -stem=0 .
|
Limit stemming to lower case query terms only
Only apply stemming to lowercase query words (to avoid stemming proper names and acronyms).
query_processor_options= -stem_lconly=true
Excluding specific words from stemming (stemming blacklist)
The Funnelback query language includes an operator that indicates a query term should not be stemmed (if stemming is enabled), or should be stemmed (if stemming is disabled).
This is done by adding a #
to the end of the word.
e.g. don’t apply the stemming to the word pie: pie#
A stemming blacklist can be created using the synonyms feature to specify a list of words that should have this operator appended.
e.g. add the following synonyms to prevent the words cat and dog from being stemmed:
When these keywords are submitted | Transform them to | Apply the transformation if |
---|---|---|
|
|
All the words must be present in the search keyword(s), in any order |
|
|
All the words must be present in the search keyword(s), in any order |