Tuning search ranking
Tuning is a process that can be used to determine which attributes of a document are indicative of relevance and adjust the ranking algorithm to match these attributes.
The default settings in Funnelback are designed to provide relevant results for the majority of websites. Funnelback uses a ranking algorithm that is influenced by many different weighted factors that scores each document in the index when a search is run. These individual weightings can be adjusted and tuning is the recommended way to achieve this.
The actual attributes that inform relevance will vary from site to site and can depend on the way in which the content is written and structured on the website, how often content is updated and even the technologies used to deliver the website..
For example the following are examples of concepts that can inform on relevance:
- How many times the search keywords appear within the document content
- If the keywords appear in the URL
- If the keywords appear in the page title, or headings
- How large the document is
- How recently the document has been updated
- How deep the document is within the website's structure
Tuning allows for the automatic detection of attributes that influence ranking. The tuning process requires training data from the content owners. This training data is made up of a list of possible searched - keywords with what is deemed to be the URL of the best answer for the keyword, as determined by the content owners.
A training set of 50-100 queries is a good size for most search implementations. Too few queries will not provide adequate broad coverage and skew the optimal ranking settings suggested by tuning. Too many queries will place considerable load on the server for a sustained length of time as the tuning tool runs each query with different combinations of ranking settings. It is not uncommon to run in excess of 1 million queries when running tuning.
Funnelback uses this list of searches to optimise the ranking algorithm, by running each of the searches with different combinations of ranking settings and analysing the results for the settings that provide the closest match to the training data.
Note: It is very important to understand that tuning does not guarantee that any of the searches provided in the training data will return as the top result - but this information should result in improved results for all searches.
The tuning tool consists of two components - the training data editor and the components to run tuning.
Any user with access to the marketing dashboard has the ability to edit the tuning data.
Only an administrator can run tuning and apply the optimal settings to a search.
Note: The running of tuning is restricted to administrators as the tuning process can place a heavy load on the server and the running of tuning needs to be managed.
Editing training data for tuning
The training data editor is accessed from the marketing dashboard by clicking on the tuning tile, or by selecting tuning from the left hand menu.
A blank training data editor is displayed if tuning has not previously been configured.
Clicking the add new button opens the editor screen.
The tuning requires 50-100 examples of desirable searches. Each desirable search requires the search query and one or more URLs that represent the best answer for the query.
Two methods are available for specifying the query:
- Enter the query directly into the keyword(s) field, or
- Click the suggest keyword(s) button the click on one of the suggestions that appear in a panel below the keyword(s) form field. The suggestions are randomised based on popular queries in the analytics. Clicking the button multiple times will generate different lists of suggestions.
Once a query has been input the URLs of the best answer(s) can be specified.
URLs for the best answers are added by either clicking the suggest URL to add or manually add a URL buttons.
Clicking the suggest URLs to add button opens a panel of the top results (based on current rankings).
Clicking on a suggested URL adds the URL as a ‘best answer'.
Additional URLs can be optionally added to the best URLs list - however the focus should be on providing additional query/best URL combinations over a single query with multiple best URLs.
A manual URL can be entered by clicking the manually add a URL button. Manually added URLs are checked as they are entered.
Clicking the save button adds the query to the training data. The tuning screen updates to show the available training data. Hovering over the error status icon shows that there is an invalid URL (the URL that was manually added above is not present in the search index).
Once all the training data has been added tuning can be run.
Tuning is run from the tuning history page. This is accessed by clicking the history sub-item in the menu, or by clicking the tuning runs button that appears in the start a tuning run message.
The tuning history shows the previous tuning history for the service and also allows users with sufficient permissions to start the tuning process.
Recall that only certain users are granted the permissions required to run tuning.
Clicking the start tuning button initiates the tuning run and the history table provides updates on the possible improvement found during the process. These numbers will change as more combinations of ranking settings are tested.
When the tuning run completes a score over time graph will be updated and the tuning runs table will hold the final values for the tuning run.
Once tuning has been run a few times additional data is added to both the score over time chart and tuning runs table.
The tuning tile on the marketing dashboard main page also updates to provide information on the most recent tuning run.
Note: The improved ranking is not automatically applied to the search. An administrator must log in to apply the optimal settings as found by the tuning process.
Understanding search tuning results
After a tuning run, additional details on the results can be viewed within the administration interface and the tuned ranking parameters can be applied to the search system.
From Funnelback's administration home page, once a collection is selected, the tune tab allows access to the view tuning results page. This page displays the details of the most recent (or currently running) tuning process.
The information displayed includes
- Success rate: The percentage of queries which return any correct URL in the top 20 results. The difference between the tuned settings and the collection as configured at the beginning of the tuning process in terms of success rate is displayed after tuning.
- Search quality score: A rating of the search quality which takes into account how prominently the correct answers are displayed, and the difference between the tuned settings and the collection as configured at the beginning of the tuning process.
- Query processor options: The best query processor options identified for the collection by the tuning process. An 'apply' button is included which allows these query process options to be applied immediately to the preview mode of the collection. Assuming the change is then verified as acceptable, it can be made public by publishing the
padre_opts.cfgfile from the browse collection configuration files page.
- Queries performed: The total number of queries performed by the tuning process.Results of past tuning processes can also be viewed by selecting a time-stamp from the "Other tuning runs" menu at the bottom of the page.
Additional details about the final results of the tuning process can be viewed by clicking the details... link. Please note that the information provided in the details section requires a deep understanding of search engine ranking to interpret.
Details available include a histogram of query scores (note that they are displayed between 0 and 1 rather than 0 and 100%), as well as details about each of the specified URLs (including the position the URL was found within the results). For each query, the explain link provides further explanation of the ranking of results for the query, and the compare link shows the tuned results alongside the original results for comparison. Each correct URL found within the search collection also has an A link, which displays information about the anchor text information pointing to the URL (which is an important ranking factor for web collections).