Tuning search ranking
Tuning is a process that can be used to determine which attributes of a document are indicative of relevance and adjust the ranking algorithm to match these attributes.
The default settings in Funnelback are designed to provide relevant results for the majority of websites. Funnelback uses a ranking algorithm, influenced by many weighted factors, that scores each document in the index when a search is run. These individual weightings can be adjusted and tuning is the recommended way to achieve this.
The actual attributes that inform relevance will vary from site to site and can depend on the way in which the content is written and structured on the website, how often content is updated and even the technologies used to deliver the website.
For example the following are examples of concepts that can inform on relevance:
How many times the search keywords appear within the document content
If the keywords appear in the URL
If the keywords appear in the page title, or headings
How large the document is
How recently the document has been updated
How deep the document is within the website’s structure
Tuning allows for the automatic detection of attributes that influence ranking in the data that is being tuned. The tuning process requires training data from the content owners. This training data is made up of a list of possible searches - keywords with what is deemed to be the URL of the best answer for the keyword, as determined by the content owners.
A training set of 50-100 queries is a good size for most search implementations. Too few queries will not provide adequate broad coverage and skew the optimal ranking settings suggested by tuning. Too many queries will place considerable load on the server for a sustained length of time as the tuning tool runs each query with different combinations of ranking settings. It is not uncommon to run in excess of 1 million queries when running tuning.
Funnelback uses this list of searches to optimize the ranking algorithm, by running each of the searches with different combinations of ranking settings and analysing the results for the settings that provide the closest match to the training data.
|Tuning does not guarantee that any of the searches provided in the training data will return as the top result. It’s purpose is to optimize the algorithm by detecting important traits found within the content, which should result in improved results for all searches.|
The tuning tool consists of two components - the training data editor and the components to run tuning.
Any user with access to the insights dashboard has the ability to edit the tuning data.
Only an administrator can run tuning and apply the optimal settings to a search.
|The running of tuning is restricted to administrators as the tuning process can place a heavy load on the server and the running of tuning needs to be managed.|
The training data editor is accessed from the insights dashboard by clicking on the tuning tile, or by selecting tuning from the left hand menu.
A blank training data editor is displayed if tuning has not previously been configured.
Clicking the add new button opens the editor screen.
The tuning requires 50-100 examples of desirable searches. Each desirable search requires the search query and one or more URLs that represent the best answer for the query.
Two methods are available for specifying the query:
Enter the query directly into the keyword(s) field, or
Click the suggest keyword(s) button the click on one of the suggestions that appear in a panel below the keyword(s) form field. The suggestions are randomised based on popular queries in the analytics. Clicking the button multiple times will generate different lists of suggestions.
Once a query has been input the URLs of the best answer(s) can be specified.
URLs for the best answers are added by either clicking the suggest URL to add or manually add a URL buttons.
Clicking the suggest URLs to add button opens a panel of the top results (based on current rankings).
Clicking on a suggested URL adds the URL as a best answer.
Additional URLs can be optionally added to the best URLs list - however the focus should be on providing additional query/best URL combinations over a single query with multiple best URLs.
A manual URL can be entered by clicking the manually add a URL button. Manually added URLs are checked as they are entered.
Clicking the save button adds the query to the training data. The tuning screen updates to show the available training data. Hovering over the error status icon shows that there is an invalid URL (the URL that was manually added above is not present in the search index).
Once all the training data has been added tuning can be run.
Tuning is run from the tuning history page. This is accessed by clicking the history sub-item in the menu, or by clicking the tuning runs button that appears in the start a tuning run message.
The tuning history shows the previous tuning history for the service and also allows users with sufficient permissions to start the tuning process.
|Recall that only certain users are granted the permissions required to run tuning.|
Clicking the start tuning button initiates the tuning run and the history table provides updates on the possible improvement found during the process. These numbers will change as more combinations of ranking settings are tested.
When the tuning run completes a score over time graph will be updated and the tuning runs table will hold the final values for the tuning run.
Once tuning has been run a few times additional data is added to both the score over time chart and tuning runs table.
The tuning tile on the insights dashboard main page also updates to provide information on the most recent tuning run.
|The improved ranking is not automatically applied to the search. An administrator must log in to apply the optimal settings as found by the tuning process.|
After a tuning run, additional details on the results can be viewed within the search dashboard and the tuned ranking parameters can be applied to the search system.
The details of the most recent (or currently running) tuning process can be viewed from the tuning section of the results page configuration.
The information displayed includes
Success rate: The percentage of queries which return any correct URL in the top 20 results. The difference between the tuned settings and the collection as configured at the beginning of the tuning process in terms of success rate is displayed after tuning.
Search quality score: A rating of the search quality which takes into account how prominently the correct answers are displayed, and the difference between the tuned settings and the results page as configured at the beginning of the tuning process.
Query processor options: The best query processor options identified for the results page by the tuning process. An 'apply' button is included which allows these query process options to be applied immediately to the preview mode of the results page. Assuming the change is then verified as acceptable, it can be made public by publishing the updated query_processor_options in the results page configuration.
Queries performed: The total number of queries performed by the tuning process. Results of past tuning processes can also be viewed by selecting a time-stamp from the "Other tuning runs" menu at the bottom of the page.
Additional details about the final results of the tuning process can be viewed by clicking the details… link. Please note that the information provided in the details section requires a deep understanding of search engine ranking to interpret.
Details available include a histogram of query scores (note that they are displayed between 0 and 1 rather than 0 and 100%), as well as details about each of the specified URLs (including the position the URL was found within the results). For each query, the explain link provides further explanation of the ranking of results for the query, and the compare link shows the tuned results alongside the original results for comparison. Each correct URL found within the search results also has an A link, which displays information about the anchor text information pointing to the URL (which is an important ranking factor for web results).
Tuning settings are applied by copying the optimal set of tuning parameters produced by running training and setting these as query_processor_options in your results page settings.
Tutorial: Apply tuning settings
To apply the optimal tuning settings return to the search dashboard, and manage the foodista search results page. Select view tuning results from the tuning panel.
The tuning results screen will be displayed showing the optimal set of ranking settings found for the training data set.
To apply these tuning options click the copy options button. These options need to be added to the query processor options for the results page. Open the Foodista results page management screen and click the edit results page configuration item from the customize panel.
Click the add new button and add a query_processor_options key, adding the tuning settings to the
-stem=2item that is set by default, then click the save button (but don’t publish your changes yet).
Return to the results page management screen for the Foodista results page and run a search for carrot against the live version of the results page. This will run the search with current ranking settings.
Observe the results noting the first few results.
Click the switch to preview mode link on the green toolbar to run the same search but against the preview version of the results page. Alternatively, return to the foodista results page management screen and rerun the search for carrot, this time against the preview version of the results page. This will run the search with the tuned ranking settings.
Observe the results noting the first few results and that the URL you selected previously has moved up in the results.
To make the ranking settings live return to foodista results page management screen and edit the results page configuration. Publish the
query_processor_optionssetting. Retest the live search to ensure that the settings have been applied successfully.