Stop words

Stop words are commonly used words that are removed from a user’s query before running a search. These words (like a, and or the) add noise to the search results and including them when running the query doesn’t generate better search results.

How are stop words applied?

When a search is run, any words within the query that are in the stop words list are removed from the query. See: Default stop words list (English)

There are some exceptions to this, which are outlined below.

  • By default, stop words are only removed when the query contains two or more terms that are not stop words. This behaviour can be altered with the ras option.

  • In addition to the stop words list, single digit ASCII characters are always treated as stop words.

  • Stop words that are within a phrase operator (between double quotes in your query) are never removed.

The ras setting switches between three levels of stop word processing. The options are:

  • ras=0 = Never remove stop words

  • ras=1 = Remove stop words when there are two or more terms that are not stop words in the query

  • ras=2 = Always remove stop words

The ras setting can be set in your results page configuration as a query_processor_option or by adding a request parameter to your URL.

Non-English language stop word lists

The default stop words applied is localized based on the value of the lang parameter, defaulting to English if the parameter is not set. The lang parameter is either set within the results page configuration query_processor_options or as a request parameter.

Stop word lists are supplied for the following languages, and are used when the lang parameter is set to the corresponding language code. The lang parameter can be specified with sub variants that are appended with an underscore. e.g. lang=en_US will apply the English stop words list.

Language code Language

ar

Arabic

bg

Bulgarian

bn

Bengali

cs

Czech

de

German

en

English

es

Spanish

fa

Persian

fi

Finnish

fr

French

hi

Hindi

hu

Hungarian

it

Italian

mr

Marathi

pl

Polish

pt

Portuguese

ro

Romanian

ru

Russian

sv

Swedish

Default stop words list (English)

The following words are stripped from a user’s query (subject to the stop word removal rules defined by the ras query processor option).

The English stop words list is located within the Funnelback installation at: INSTALL_DIRECTORY/share/lang/en_stopwords. Stop words lists for other languages can also be viewed by inspecting the appropriate file within the same folder.

a
a's
able
about
above
according
accordingly
across
actually
after
afterwards
again
against
ain't
all
allow
allows
almost
alone
along
already
also
although
always
am
among
amongst
an
and
another
any
anybody
anyhow
anyone
anything
anyway
anyways
anywhere
apart
appear
appreciate
appropriate
are
aren't
around
as
aside
ask
asking
associated
at
available
away
awfully
b
be
became
because
become
becomes
becoming
been
before
beforehand
behind
being
believe
below
beside
besides
best
better
between
beyond
both
brief
but
by
c
c'mon
c's
came
can
can't
cannot
cant
cause
causes
certain
certainly
changes
clearly
co
com
come
comes
concerning
consequently
consider
considering
contain
containing
contains
corresponding
could
couldn't
course
currently
d
definitely
described
despite
did
didn't
different
do
does
doesn't
doing
don't
done
down
downwards
during
e
each
edu
eg
eight
either
else
elsewhere
enough
entirely
especially
et
etc
even
ever
every
everybody
everyone
everything
everywhere
ex
exactly
example
except
f
far
few
fifth
first
five
followed
following
follows
for
former
formerly
forth
four
from
further
furthermore
g
get
gets
getting
given
gives
go
goes
going
gone
got
gotten
greetings
h
had
hadn't
happens
hardly
has
hasn't
have
haven't
having
he
he's
hello
help
hence
her
here
here's
hereafter
hereby
herein
hereupon
hers
herself
hi
him
himself
his
hither
hopefully
how
howbeit
however
i
i'd
i'll
i'm
i've
ie
if
ignored
immediate
in
inasmuch
inc
indeed
indicate
indicated
indicates
inner
insofar
instead
into
inward
is
isn't
it
it'd
it'll
it's
its
itself
j
just
k
keep
keeps
kept
know
knows
known
l
last
lately
later
latter
latterly
least
less
lest
let
let's
like
liked
likely
little
look
looking
looks
ltd
m
mainly
many
may
maybe
me
mean
meanwhile
merely
might
more
moreover
most
mostly
much
must
my
myself
n
name
namely
nd
near
nearly
necessary
need
needs
neither
never
nevertheless
new
next
nine
no
nobody
non
none
noone
nor
normally
not
nothing
novel
now
nowhere
o
obviously
of
off
often
oh
ok
okay
old
on
once
one
ones
only
onto
or
other
others
otherwise
ought
our
ours
ourselves
out
outside
over
overall
own
p
particular
particularly
per
perhaps
placed
please
plus
possible
presumably
probably
provides
q
que
quite
qv
r
rather
rd
re
really
reasonably
regarding
regardless
regards
relatively
respectively
right
s
said
same
saw
say
saying
says
second
secondly
see
seeing
seem
seemed
seeming
seems
seen
self
selves
sensible
sent
serious
seriously
seven
several
shall
she
should
shouldn't
since
six
so
some
somebody
somehow
someone
something
sometime
sometimes
somewhat
somewhere
soon
sorry
specified
specify
specifying
still
sub
such
sup
sure
t
t's
take
taken
tell
tends
th
than
thank
thanks
thanx
that
that's
thats
the
their
theirs
them
themselves
then
thence
there
there's
thereafter
thereby
therefore
therein
theres
thereupon
these
they
they'd
they'll
they're
they've
think
third
this
thorough
thoroughly
those
though
three
through
throughout
thru
thus
to
together
too
took
toward
towards
tried
tries
truly
try
trying
twice
two
u
un
under
unfortunately
unless
unlikely
until
unto
up
upon
us
use
used
useful
uses
using
usually
uucp
v
value
various
very
via
viz
vs
w
want
wants
was
wasn't
way
we
we'd
we'll
we're
we've
welcome
well
went
were
weren't
what
what's
whatever
when
whence
whenever
where
where's
whereafter
whereas
whereby
wherein
whereupon
wherever
whether
which
while
whither
who
who's
whoever
whole
whom
whose
why
will
willing
wish
with
within
without
won't
wonder
would
would
wouldn't
x
y
yes
yet
you
you'd
you'll
you're
you've
your
yours
yourself
yourselves
z
zero

Custom stop words list

This feature is not available in the Squiz DXP.

A custom stop words list can be used instead of the default list by defining the -STOP query processor option. The value should be set to the absolute path to the text file containing the stop words, or path relative to the $SEARCH_HOME/share/lang folder.

only a single stop words list is applied. If you wish to use a custom stop words list it must include all the words to consider as stop words and is not combined with the locale specific default list.

Default value

-STOP=$SEARCH_HOME/share/lang/en_stopwords

Example

Set the stopwords to custom_stopwords.txt stored in the collection’s configuration folder:

query_processor_options= -STOP=$SEARCH_HOME/conf/$COLLECTION_NAME/custom_stopwords.txt

or

query_processor_options= -STOP=../../conf/$COLLECTION_NAME/custom_stopwords.txt