Funnelback has the ability to recommend a list of items that are believed to be related to a given "seed" item. For example, it might recommend a list of URLs that were clicked on in the same search sessions as the seed URL, or recommend a list of products that were frequently purchased in the same session as a given product.

Recommendations are not enabled by default. To enable them, please see the recommender collection.cfg setting.

Data Sources

The main source of information for the Recommender System are Funnelback's query and click logs, which record information on which queries users submitted and which results they clicked on. These logs will be processed automatically as part of the collection update process, and no special configuration is needed to set the Recommender System up. By default only query and click records from the last 60 days will be taken into account when processing the logs.

The results of this processing are a number of data sources which the system takes into account, in decreasing order of preference:

  1. CO_CLICKS: items that co-occur as clicks in the same (time-limited) search session as the original seed URL.
  2. RELATED_CLICKS: items that were clicked on for the top N queries associated with the seed URL.
  3. RELATED_RESULTS: the most frequently occurring search results that are returned for the top N queries associated with the seed URL.
  4. EXPLORE_RESULTS: results from running an explore query. These suggestions will be based on how similar their textual content is to the seed URL.

There is another data source that does not have data of its own, and this is the default used if no explicit "source" setting is given in a request to the system:

  • DEFAULT: Query all the sources listed above and combine the results.

If recommendations have come from more than one data source we combine the lists, using the following comparison "chain":

    compare(x.getSource(), y.getSource())
    .compare(y.getFrequency(), x.getFrequency())
    .compare(x.getRank(), y.getRank())
  • The data sources above are listed in order of preference, so items from the CO_CLICKS source will always win out over other sources and be listed first.
  • If two items have the same source then their frequency of occurrence (across all sources) will be compared next. This means that the more often an item appears the higher it will rise in the list.
  • Finally, if two items have the same source and frequency, we will compare their rank in the source they came from.
  • There should be no items with duplicate IDs (e.g. the same URL) in the final sorted list.

Other sources of data outside those processed as standard might include:

  1. Social media "likes" (e.g. from the Facebook API, Twitter mentions etc.)
  2. Purchase data (e.g. from client's e-commerce database system)
  3. Data from web analytics software (e.g. Google Analytics)
  4. Web server access logs

To support these other sources their data would need to be exported and converted into a 'pseudo' click log format, for processing by Funnelback. For example, the fact that someone purchased a product would be recorded as a "click" on the product URL in the generated click log.


The diagram below shows the architecture of the Recommender System:


On the bottom left hand side we see the standard Funnelback click logs being processed by the Recommender, which then produces a "suggestions" database. The section above shows implementation-specific purchase data and social media data being exported and converted into "pseudo" click logs for use by the Recommender system. As noted in the diagram, "integration is required" for this i.e. some kind of conversion script or program will have to be written to achieve this.

Once the suggestions database is available for a particular collection the Recommender "end-point" can respond to RESTful HTTP requests from callers, returning a JSON response.


The following example URL is one that a caller might request to get recommendations for the given seed item and collection:

The parameters for the request are:


item (URL) to get suggestions for


Funnelback collection that the URL is expected to be in


maximum number of recommendations to return (optional, may be less than this available). If not specified then the system will attempt to return as many recommendations as it can.


comma separated list of scopes to match (optional)


source of recommendations, one of default|co_clicks|result_clicks|related_results|explore_results (optional). Here 'default' specifies that all sources will be queried and the results blended.Note that if you are trying to support cross-domain requests from within browsers then you will need to make use of JSONP. For example, if you are using jQuery then setting dataType to "jsonp" will cause an extra "callback" parameter to be added to the request URL to have a given callback function specified.

JSON Response

A sample JSON response is shown below:

    "RecommendationResponse": {
        "status": "OK",
        "seedItem": "",
        "collection": "sample",
        "scope": "",
        "maxRecommendations": 10,
        "sourceCollection": "sample",
        "source": "DEFAULT",
        "timeTaken": 37,
        "recommendations": [
                "itemID": "",
                "source": "CO_CLICKS",
                "title": "Graduate Jobs Application Process",
                "date": 1379944800000,
                "qieScore": 4.679,
                "metaData": {
                    "f": "text/html",
                    "d": "2013-04-10",
                    "t": "Graduate Jobs Application Process",
                    "b": "",
                    "s": "Careers, Jobs, Graduates",
                    "c": "This document provides information on how to apply for graduate-level jobs at",
                    "a": "Information Management Services (Phone: 9265 2876)",
                    "l": "en",
                "description": "This document provides information on how to apply for graduate-level jobs at",
                "format": "text/html",
                "frequency": 1

In addition to all of the input parameters being reflected back at the start of the JSON response, details on the meaning of the other fields are as follows:


status of response. This will be one of: OK, SEED_NOT_FOUND (seed item is not known about in this collection) or NO_SUGGESTIONS_FOUND.


actual number of recommendations returned in this response <= maxRecommendations.


the source collection that the recommendations come from. This will usually be the same as the requested collection, but may be different if the requested collection was a meta collection and the data source (click logs) were present in a component collection.


the requested data source of the recommendations (same as the 'source' parameter in the request)


the amount of time taken to generate this response, in milliseconds.Details on the fields in the individual recommendations are as follows:


The ID of this recommended item (usually a URL, but could be a unique product ID etc.).


Source of this individual item - see section on 'Data Sources' above.


Title of the item (e.g. HTML title of a web page).


The last modified date of the item, expressed as a Unix timestamp (milliseconds since the epoch).


The QIE score for this item.


Values for individual metadata classes.


Description as extracted from document metadata.


The format (MIME type) of this item.


The frequency of occurrence of this item across all queried data sources.In this example we are only showing details on one recommendation - in practice there will usually be more than one item in the list.

Item IDs

  • If the Item ID returned in the JSON response is a URL then it will be the URL as indexed by Funnelback.
  • This may be different to the display URL, which may be a transformed version of the indexed URL so that a user can load it correctly in their web browser.
  • For example, a database collection may have record URLs which need to be transformed by a Groovy filter during the filtering phase.
  • This means that the caller which processed the JSON response from the Recommender System may need to do a similar transformation so that they can display working URLs to end users.


Recommender update log messages will be written to:



The recommendation system does not currently operate on Funnelback Push collections.

See also