Getting started - what is search?
A search engine’s job is to compile a unified index of content which can come from many sources.
This index is similar to the index at the back of a book - it provides a way of quickly finding the information that is relevant to a series of keywords provided to it when a user runs a search.
In order to create the index a search engine like Funnelback has to first scan through all the content that needs to be indexed. The process of gathering this content will depend on the type of content being indexed - for web content a tool called a web crawler is used.
A web crawler gathers content by loading a start page (or set of start pages) and following all the links that it finds. Each link is downloaded and the process repeated until all the content is gathered. Rules about what links can be followed are set when configuring the search (a rule might be something like only gather pages from www.mysite.com).
This means that a page must be linked in order for it to be discovered by a search engine.
Other types might just supply a list of items to gather and index.
A copy of each item is stored as they are discovered. Once all the content is gathered the search engine analyses all the content to build the index. A lot of information about each item is recorded, from counts of each keyword in a document to the number of items that link to the document. This information all forms part of the index and is used to evaluate how relevant an item might be when a search is run.
Modern search engines have to be able to return results that a relevant to the end user very quickly, often with very little information from the user making the search.
The first thing a search engine does it to take the words entered by the user and simplify them - this includes removing punctuation and words such as and, or, or of.
It will then take this shorter list of words and find pages that match each of the words, compiling these into a list of search results.
The order of the search results is determined by many factors and these will vary from search engine to search engine, and Funnelback allows you to change things that are important in determining how relevant something is.
Some examples of things considered include:
How many times each word appears in the content of an item
How many times the item is linked to by other items
Do the word(s) appear in the item’s title?
When was the item last modified?
Funnelback is a search platform that not only produces search results, but also includes a suite of tools that can be used to transform, audit and analyse site content.
As a search engine Funnelback’s role is similar to other search engines - to find relevant documents based on very basic keywords. These keywords are matched against the text inside all of the documents that it finds and retrieves.
Funnelback excels at searching unstructured content with simple keywords. This differs from a database SQL search that operates on a very structured (or fielded) repository and uses complex queries to provide very specific matches within the data.
The Funnelback platform also provides a number of other tools to assist you in understanding and auditing your content such as: