Improve your search result relevance without altering the weightings in the ranking algorithm
There are many factors that affect the quality of ranking for a search, many of which do not involve adjustments to the weightings used by Funnelback’s ranking algorithm. This guide provides a general overview of what should be considered when attempting to improve search result rankings.
Check the following things before diving into a deeper analysis and attempting to change ranking parameters.
Are pages missing from the index?
One of the most common cause of ranking problems is pages missing from the index. This can occur for a range of reasons such as:
-
misconfigured crawler include or exclude patterns,
-
robots.txt
and metadata directives preventing pages from being crawled or indexed, -
discontinuities in the crawl path (e.g. JavaScript links),
-
authenticated pages/sites,
-
filter errors (e.g. password protected PDF documents), and
-
crawler configuration heuristics limiting the crawl (e.g. maximum link distance, URL length, pages per directory).
See: search results is missing a particular URL for a comprehensive guide on troubleshooting missing URLs.
Is there a lot of useless content in the index?
The search results can only be as good as the content that is included within the index. If there is too much noise in the search index then the result quality will drop. A good technique that can improve ranking is to ensure that only suitable content is included within the search index.
-
Add crawler exclude patterns or robots directives to eliminate low quality search results. Low quality content can include things like dynamically generated calendars, outdated manuals, and category listing pages.
-
Sometimes organizations such as universities have authenticated sub-sites that need to be found in their website public search so that users can find the login page. In these cases, it’s best to add either an unauthenticated landing page or a curation to help users find the login page.
-
Define canonical URLs in your web content to ensure pages are stored with the correct URLs and eliminate potential duplicate content.
-
Use Funnelback noindex tags in your website templates to hide non-content parts of web pages (such as the header/footer and site navigation).
Use synonyms, curator and query blending to improve your ranking
There’s often a difference between the words a user knows, and the words you use in your content. Use tools such as synonyms to overcome these language barriers.
-
Add synonyms for any searches that are failing due to mismatches in terminology (e.g. jobs, careers, employment).
-
Add curations for any searches that don’t have content in the index. E.g. a bank that doesn’t offer total and permanent disability insurance might want to direct users to their employment insurance product (which may lie outside of the search index).
-
Using Funnelback query blending feature to fix mismatches in US vs. UK spelling.
Use of non-default ranking settings
Another cause of ranking problems is the existence of ranking settings that have been set incorrectly, or set on an earlier Funnelback version and that are no longer optimal for your version.
Ranking is configured using query processor options, which can either be set as a configuration setting (query_processor_options
) and/or done through CGI parameters in the HTML search form itself.
Check the query processor options for any ranking related settings that indicate a non-default ranking configuration is in use. If you find any such settings, it might be worth checking whether the default ranking (i.e. all query processor options removed) improves search quality. A common custom ranking setting to look for is Same-Site Suppression (SSS), which shows search results over a broad range of website paths rather than search results all from the same path.
User click feedback misconfiguration
Funnelback’s click feedback system ensures the ranking system continues to improve with each search. If ranking appears to be suboptimal, it could be that click feedback is not operating correctly. This generally happens due to one of these conditions:
-
The search results are not using the click links.
-
Click feedback is disabled. Check that the ‘click_tracking’ setting in the results page configuration is not set to ‘false’ (it is enabled by default).
-
A misconfiguration in the click feedback settings. Check the configuration for
click_data.week_limit
,click_data.num_archived_logs_to_use
, and click_data_archive_dirs_collection_cfg
settings.
Once click feedback is enabled, it will generally require a couple of months of search traffic to learn the optimal ranking.
See: using click data to improve rankings for further information.