Using click data to improve rankings

Result quality can be improved in some situations by utilizing click data. Click data are records of which results users have clicked on in response to particular queries. The idea is that if users are selecting a particular result from a list of results, then this result is more likely to be an important resource than other resources.

Adjusting the click data scope

The following configuration settings can be used to limit the amount of click data that is considered for ranking.

  • click_data.num_archived_logs_to_use This option should be a number indicating how many logs to use from each archive directory listed. e.g. Setting this option to 5 will mean that click data from the last 5 logs (typically each log represents the amount of time between data spource updates) in your archive directories will be taken into consideration when calculating query results. This option can be set to all to indicate that every available click data log should be used.

  • click_data.week_limit This option, if set, limits the inclusion of click data to clicks that have occurred in the previous n weeks where n is the value that this option is set to. It is useful to set this feature in regularly changing websites to make sure that the click data used does not represent clicks on documents that may have since been changed or moved.

Weighting click data

Like all sources of new information, click data can have a varying degree of impact on the quality of your search results. It is important to weight the information appropriately for your results page in order to obtain the best results.

Weighting of click data can be achieved with the sco=2 and -wmeta.K options, included as either part of a search URL or as part of the query_processor_options configuration parameter in the results page configuration. For example, to set the click data weight to 0.7 (the default is 0.5) you might include it in a search URL:

http://example.com/s/search.html?collection=example&query=stuff&sco=2[K]&wmeta.K=0.7

or perhaps set a configuration option in the results page configuration:

query_processor_options= -wmeta.K=0.7

Disabling click data

Click data is enabled by default.

To disable click data set click_data.use_click_data_in_index to false.

Click data is included in the indexing phase of updating a data source so changes will only take effect after your data source has been updated. See updating a data source for more details on updating your data source.