Stop words
Stop words are commonly used words that are removed from a user’s query before running a search. These words (like a, and or the) add noise to the search results and including them when running the query doesn’t generate better search results.
How are stop words applied?
When a search is run, any words within the query that are in the stop words list are removed from the query. See: Default stop words list (English)
There are some exceptions to this, which are outlined below.
-
By default, stop words are only removed when the query contains two or more terms that are not stop words. This behaviour can be altered with the
ras
option. -
In addition to the stop words list, single digit ASCII characters are always treated as stop words.
-
Stop words that are within a phrase operator (between double quotes in your query) are never removed.
The ras
setting switches between three levels of stop word processing. The options are:
-
ras=0
= Never remove stop words -
ras=1
= Remove stop words when there are two or more terms that are not stop words in the query -
ras=2
= Always remove stop words
The ras setting can be set in your results page configuration as a query_processor_option
or by adding a request parameter to your URL.
Non-English language stop word lists
The default stop words applied is localized based on the value of the lang
parameter, defaulting to English if the parameter is not set. The lang
parameter is either set within the results page configuration query_processor_options
or as a request parameter.
Stop word lists are supplied for the following languages, and are used when the lang
parameter is set to the corresponding language code. The lang parameter can be specified with sub variants that are appended with an underscore. e.g. lang=en_US
will apply the English stop words list.
Language code | Language |
---|---|
ar |
Arabic |
bg |
Bulgarian |
bn |
Bengali |
cs |
Czech |
de |
German |
en |
English |
es |
Spanish |
fa |
Persian |
fi |
Finnish |
fr |
French |
hi |
Hindi |
hu |
Hungarian |
it |
Italian |
mr |
Marathi |
pl |
Polish |
pt |
Portuguese |
ro |
Romanian |
ru |
Russian |
sv |
Swedish |
Default stop words list (English)
The following words are stripped from a user’s query (subject to the stop word removal rules defined by the ras
query processor option).
The English stop words list is located within the Funnelback installation at: INSTALL_DIRECTORY/share/lang/en_stopwords
. Stop words lists for other languages can also be viewed by inspecting the appropriate file within the same folder.
a a's able about above according accordingly across actually after afterwards again against ain't all allow allows almost alone along already also although always am among amongst an and another any anybody anyhow anyone anything anyway anyways anywhere apart appear appreciate appropriate are aren't around as aside ask asking associated at available away awfully b be became because become becomes becoming been before beforehand behind being believe below beside besides best better between beyond both brief but by c c'mon c's came can can't cannot cant cause causes certain certainly changes clearly co com come comes concerning consequently consider considering contain containing contains corresponding could couldn't course currently d definitely described despite did didn't different do does doesn't doing don't done down downwards during e each edu eg eight either else elsewhere enough entirely especially et etc even ever every everybody everyone everything everywhere ex exactly example except f far few fifth first five followed following follows for former formerly forth four from further furthermore g get gets getting given gives go goes going gone got gotten greetings h had hadn't happens hardly has hasn't have haven't having he he's hello help hence her here here's hereafter hereby herein hereupon hers herself hi him himself his hither hopefully how howbeit however i i'd i'll i'm i've ie if ignored immediate in inasmuch inc indeed indicate indicated indicates inner insofar instead into inward is isn't it it'd it'll it's its itself j just k keep keeps kept know knows known l last lately later latter latterly least less lest let let's like liked likely little look looking looks ltd m mainly many may maybe me mean meanwhile merely might more moreover most mostly much must my myself n name namely nd near nearly necessary need needs neither never nevertheless new next nine no nobody non none noone nor normally not nothing novel now nowhere o obviously of off often oh ok okay old on once one ones only onto or other others otherwise ought our ours ourselves out outside over overall own p particular particularly per perhaps placed please plus possible presumably probably provides q que quite qv r rather rd re really reasonably regarding regardless regards relatively respectively right s said same saw say saying says second secondly see seeing seem seemed seeming seems seen self selves sensible sent serious seriously seven several shall she should shouldn't since six so some somebody somehow someone something sometime sometimes somewhat somewhere soon sorry specified specify specifying still sub such sup sure t t's take taken tell tends th than thank thanks thanx that that's thats the their theirs them themselves then thence there there's thereafter thereby therefore therein theres thereupon these they they'd they'll they're they've think third this thorough thoroughly those though three through throughout thru thus to together too took toward towards tried tries truly try trying twice two u un under unfortunately unless unlikely until unto up upon us use used useful uses using usually uucp v value various very via viz vs w want wants was wasn't way we we'd we'll we're we've welcome well went were weren't what what's whatever when whence whenever where where's whereafter whereas whereby wherein whereupon wherever whether which while whither who who's whoever whole whom whose why will willing wish with within without won't wonder would would wouldn't x y yes yet you you'd you'll you're you've your yours yourself yourselves z zero
Custom stop words list
This feature is not available in the Squiz DXP. |
A custom stop words list can be used instead of the default list by defining the -STOP
query processor option. The value should be set to the absolute path to the text file containing the stop words, or path relative to the $SEARCH_HOME/share/lang
folder.
only a single stop words list is applied. If you wish to use a custom stop words list it must include all the words to consider as stop words and is not combined with the locale specific default list. |