Using click data to improve rankings


Result quality can be improved in some situations by utilising click data. Click data are records of which results users have clicked on in response to particular queries. The idea is that if users are selecting a particular result from a list of results, then this result is more likely to be an important resource than other resources.

Funnelback keeps a record of all click data against each collection and this can incorporated into the Funnelback ranking algorithms to improve result quality.

Including click data

Setting up Funnelback to take into account click data for your collections is a simple procedure that requires editing a small number of collection.cfg options. The main option is:

This option should be set to "true" to enable the inclusion of click data click_data.use_click_data_in_index=true.

Click data is included in the indexing phase of updating a collection so it will only take effect after your collection has been updated. See updating collections for more details on updating your collections.

Once the main option has been set setting the following parameters will control which click data is used:

  • click_data.num_archived_logs_to_use This option should be a number indicating how many logs to use from each archive directory listed. e.g. Setting this option to 5 will mean that click data from the last 5 logs (typically each log represents the amount of time between collection updates) in your archive directories will be taken into consideration when calculating query results. This option can be set to all to indicate that every available click data log should be used.

  • click_data.week_limit This option, if set, limits the inclusion of click data to clicks that have occurred in the previous n weeks where n is the value that this option is set to. It is useful to set this feature in regularly changing websites to make sure that the click data used does not represent clicks on documents that may have since been changed or moved.

Weighting click data

Like all sources of new information, click data can have a varying degree of impact on the quality of your search results. It is important to weight the information appropriately for your collection in order to obtain the best results. This should be done as part of the normal tuning of Funnelback to suit your unique collection.

Weighting of click data can be achieved with the wmeta.K option, included as either part of a search URL or as part of the query_processor_options configuration parameter in the collection.cfg file. For example, to set the click data weight to 0.7 (the default is 0.5) you might include it in a search URL:

or perhaps set a configuration option in collection.cfg:

query_processor_options= -wmeta.K=0.7

© 2015- Squiz Pty Ltd