Recommendations

Funnelback has the ability to recommend a list of items that are believed to be related to a given seed item. For example, it might recommend a list of URLs that were clicked on in the same search sessions as the seed URL, or recommend a list of products that were frequently purchased in the same session as a given product.

Recommendations are designed to be accessed via an API, and used within individual contents pages to provide recommendations such as people who were interested in this item were also interested in…​

Recommendations are not enabled by default. To enable recommendations:

  • Edit the search package configuration and set the following configuration option: recommender=true.

  • Update the search package indexes by running an update of any of the attached data sources (this will trigger an update to rebuild the indexes for the search package).

Recommendation sources

The main source of information for the recommender system are Funnelback’s query and click logs, which record information on which queries users submitted and which results they clicked on. These logs will be processed automatically as part of the search package analytics update process, and no special configuration is needed to set the recommender system up. By default, only query and click records from the last 60 days will be taken into account when processing the logs.

The results of this processing are a number of sources which the system takes into account, in decreasing order of preference:

  1. CO_CLICKS: items that co-occur as clicks in the same (time-limited) search session as the original seed URL.

  2. RELATED_CLICKS: items that were clicked on for the top N queries associated with the seed URL.

  3. RELATED_RESULTS: the most frequently occurring search results that are returned for the top N queries associated with the seed URL.

  4. EXPLORE_RESULTS: results from running a explore query. These suggestions will be based on how similar their textual content is to the seed URL.

There is another source that does not have data of its own, and this is the default used if no explicit source setting is given in a request to the system:

  • DEFAULT: Query all the sources listed above and combine the results.

If recommendations have come from more than one source we combine the lists, using the following comparison chain:

compare(x.getSource(), y.getSource())
.compare(y.getFrequency(), x.getFrequency())
.compare(x.getRank(), y.getRank())
  • The sources above are listed in order of preference, so items from the CO_CLICKS source will always win out over other sources and be listed first.

  • If two items have the same source then their frequency of occurrence (across all sources) will be compared next. This means that the more often an item appears the higher it will rise in the list.

  • Finally, if two items have the same source and frequency, we will compare their rank in the source they came from.

  • There should be no items with duplicate IDs (e.g. the same URL) in the final sorted list.

Other sources of recommendations outside those processed as standard might include:

  1. Social media "likes" (e.g. from the Facebook API, Twitter mentions etc.)

  2. Purchase data (e.g. from client’s e-commerce database system)

  3. Data from web analytics software (e.g. Google Analytics)

  4. Web server access logs

To support these other sources their data would need to be exported and converted into a 'pseudo' click log format, for processing by Funnelback. For example, the fact that someone purchased a product would be recorded as a click on the product URL in the generated click log.

Architecture

The diagram below shows the architecture of the Recommender System:

Recommender-architecture.png

On the bottom left-hand side we see the standard Funnelback click logs being processed by the recommender, which then produces a suggestions database. The section above shows implementation-specific purchase data and social media data being exported and converted into pseudo click logs for use by the recommender system. As noted in the diagram, integration is required for this i.e. some kind of conversion script or program will have to be written to achieve this.

Once the suggestions database is available for a particular data source the recommender end-point can respond to RESTful HTTP requests from callers, returning a JSON response.

RESTful API

The following example URL is one that a caller might request to get recommendations for the given seed item and search package:

http://example.com/s/recommender/similarItems.json?seedItem=http://example.com/jobs/graduate/&collection=sample&maxRecommendations=10&scope=example.com/jobs/&source=default

The parameters for the request are:

Parameter Description

seedItem

item (URL) to get suggestions for

collection

Funnelback search package that the URL is expected to be in

maxRecommendations

maximum number of recommendations to return (optional, may be less than this available). If not specified then the system will attempt to return as many recommendations as it can.

scope

comma separated list of scopes to match (optional)

source

source of recommendations, one of default, co_clicks, related_clicks, related_results or explore_results (optional). Here default specifies that all sources will be queried and the results blended. Note that if you are trying to support cross-domain requests from within browsers then you will need to make use of JSONP. For example, if you are using jQuery then setting dataType to jsonp will cause an extra callback parameter to be added to the request URL to have a given callback function specified.

JSON Response

A sample JSON response is shown below:

{
    "RecommendationResponse": {
        "status": "OK",
        "seedItem": "http://example.com/jobs/graduate/",
        "collection": "sample",
        "scope": "example.com/jobs/",
        "maxRecommendations": 10,
        "sourceCollection": "sample",
        "source": "DEFAULT",
        "timeTaken": 37,
        "recommendations": [
            {
                "itemID": "http://example.com/jobs/graduate/how-to-apply/",
                "source": "CO_CLICKS",
                "title": "Graduate Jobs Application Process",
                "date": 1379944800000,
                "qieScore": 4.679,
                "metaData": {
                    "f": ["text/html"],
                    "d": ["2013-04-10"],
                    "t": ["Graduate Jobs Application Process"],
                    "b": ["http://example.com/legal/"],
                    "s": ["Careers, Jobs, Graduates"],
                    "c": ["This document provides information on how to apply for graduate-level jobs at example.com"],
                    "a": ["Information Management Services (Phone: 9265 2876)"],
                    "l": ["en"],
                },
                "description": "This document provides information on how to apply for graduate-level jobs at example.com",
                "format": "text/html",
                "frequency": 1
            },
        ]
    }
}

In addition to all the input parameters being reflected back at the start of the JSON response, details on the meaning of the other fields are as follows:

Main response fields:

Field name Description

status

Status of response. This will be one of: OK, SEED_NOT_FOUND (seed item is not known about in this search package) or NO_SUGGESTIONS_FOUND.

numRecommendations

The actual number of recommendations returned in this response.

sourceCollection

The search package that the recommendations come from.

source

The requested data source of the recommendations (same as the 'source' parameter in the request).

timeTaken

The amount of time taken to generate this response, in milliseconds.

Details on the fields in the individual recommendations are as follows:

Field name Description

itemID

The ID of this recommended item (usually a URL, but could be a unique product ID etc.).

source

Source of this individual item - see section on data sources above.

title

Title of the item (e.g. HTML title of a web page).

date

The last modified date of the item, expressed as a Unix timestamp (milliseconds since the epoch).

qieScore

The QIE score for this item.

metadata

Values for individual metadata classes.

description

Description as extracted from document metadata.

format

The format (MIME type) of this item.

frequency

The frequency of occurrence of this item across all queried data sources. The example above shows details of only one recommendation - in practice there will usually be more than one item in the list.

Item IDs

  • If the item ID returned in the JSON response is a URL then it will be the URL as indexed by Funnelback.

  • This may be different to the display URL, which may be a transformed version of the indexed URL so that a user can load it correctly in their web browser.

  • For example, a database data source may have record URLs which need to be transformed by a filter plugin during the filtering phase.

  • This means that the caller which processed the JSON response from the recommender system may need to do a similar transformation so that they can display working URLs to end users.

Logging

Recommender update log messages will be written to:

$SEARCH_HOME/data/<SEARCH-PACKAGE-ID>/log/offline/update.log

Caveats

Example: Using JQuery to add recommendations to a web page

The example JQuery function below fetches and injects recommendations for the current URL from Funnelback’s recommender API and injects them into an element with an ID of similar-items.

The current page URL (document.location.href) must exactly match the URL that was indexed by Funnelback for any matching recommendations to be returned.

Before you start it is a good idea to access the similarItems.json URL directly to verify that recommendations are being returned.

jQuery(document).ready(function () {
	jQuery.ajax({
	    type: 'GET',
	    url: 'http://<FUNNELBACK-SERVER>/s/recommender/similarItems.json?seedItem='+document.location.href+'&collection=<SEARCH-PACKAGE-ID>&maxRecommendations=<NUM-RECOMMENDATIONS>&source=<MODE>', (1)
	    data: { get_param: 'value' },
	    dataType: 'jsonp',
	    success: function (data) {
	        $.recommendations.each(data, function(index, element) {
	            $('#similar-items').append('<div><a href="'+element.itemID+'">'+element.title+'</a> ('+element.metaData.type+')</div>');
	        });
	    }
	});
});