Funnelback logo

Documentation

Query reports

Introduction

The Funnelback query reports system allows you to view different reports generated from query and click logs.

Updating query reports

Reports need to be updated at least once before viewing, and updated for new data regularly. Normally this should be done out of hours. The scheduling page provides the ability to schedule these updates.

In addition, the "Update Analytics" link in the Control Panel takes you to the following page which allows you to select some or all collections and have their reports manually updated.

Update-query-reports.png

Please be aware that updating reports may take some time for collections which have a high query volume.

Query reporting interface

Reports_nav_pane.png

At the top of all query reports pages, a navigation pane is displayed. This pane includes 4 elements: information on the context of the current report, a "timeframe" selection dropdown, a "report selection" dropdown and an "export this report" dropdown.

The context of the current report is displayed in a horizontal bar, and includes the collection that the report is being displayed for, the time period that the report is being displayed for and the time that the reports were last updated.

The report selection dropdown is used to select the report to be displayed. Selecting a report within this dropdown will display the selected report immediately.

The timeframe selection dropdown is used to select a timeframe to display reports for. Valid timeframes are: yesterday, this week, last week, this month, last month, this quarter, last quarter, this year, last year, custom and by period. Selecting a timeframe and selecting "go" will update the currently displayed report to show data for the selected timeframe.

"Custom" may be selected to allow entering a custom start and end date, as in the screenshot below (please be aware that default security settings for IE on windows server systems may prevent the date selection from operating correctly).

Reports_nav-custom.png

"By period" can be used to easily switch between different time periods and compare reports. When selecting "By period" a new period dropdown will appear for selecting the period. Once the period has been chosen and validated by clicking Go, navigation links will appear to allow easy navigation between previous and next periods.

The export this report as links allow exporting of the currently displayed report in the selected format. The exported report will contain exactly the data currently displayed on the screen. Supported formats include PDF, CSV, XML, JSON and tag cloud.

Reports

Overall summary

This report displays a overview of all existing reports with the first five rows for each reports. It allows to take a quick glance at the existing data and access any specific report by clicking on the "More..." link at the bottom of each report.

The default mode for this report is to display a summary of the data for the last 90 days.

Top queries

The top queries report displays queries along with in-depth information for each query, including: related queries (e.g. "home loan" may have the related query "mortgage"), the shape of the query over the past 10 days (showing if it is increasing or decreasing in popularity) and user locations (most popular locations for performing that query — for example the query "big day out" may be popular in Melbourne and Sydney but not in Darwin). The top queries report also displays the most popular result click for the given query.

NB: The titles for the URLs in the "Top Click" column are taken from a "url_titles.log" file which is created by the webcrawler in the same log directory as other crawler log files.

The count figure for the top queries report is calculated from the linked query term only, and is not accumulated from the unlinked, related queries shown in the same row.

Usage summary (Chart)

The usage summary report displays usage data for the search service — a timeplot charts the volume of queries, result clicks and best bet clicks for the time period specified. Search service usage typically falls over weekends and public holidays.

Monthly usage summary

The montly usage summary is similar to the previous one, but groups volume of queries, result clicks and best bet clicks by months.

Top result clicks

The top result clicks report displays the most popular results that users have clicked on. By clicking on the "queries..." link for each click you can then see which queries were associated with that clicked URL.

Top best bet clicks

The top best bet clicks report displays the most popular best bet that users have clicked on.

Top faceted navigation clicks

The top faceted navigation clicks report displays the most popular faceted navigation categories that users have clicked on.

Top contextual navigation clicks

The top contextual navigation clicks report displays the most popular contextual navigation suggestions that users have clicked on.

Top zero result queries

The top zero result queries report displays information on the most popular queries which had zero fully matching results found. If a particular query is very popular on this list then you should investigate options for returning more relevant results for that query, possibly through query transformations, best bets or crawling more data.

Top searchers by IP address / top searchers by city / top searchers by country

The top searchers reports displays information on the most frequent users of the search service, either by IP address, city or country.

Queries per hour

The queries per hour report displays information on how many queries were performed for the given hour over the time period specified. For example, you can use the queries per hour report to show that 4,321 queries were performed between 10am and 11am in a given month.

Eliminating Noise

Some search services receive significant numbers of automated or spam queries which you may not wish to see in your query reports. Such queries can be eliminated from consideration through the collection's reporting-blacklist.cfg file.

Large Date Ranges

In some circumstances the reporting system may align entered dates to “month boundaries” in order to improve reporting performance. This will occur when the number of days in the specified date range is above a certain configured limit (see the max day resolution daterange setting). The earliest date in the set daterange will be moved back until it is at the start of a month, and the latest date will be moved forward until it is at the end of the month.

For example, given the default maximum 30 days in a daterange:

  • Given an entered custom daterange of 29/Jan/2009-15/Feb/2009, the entered daterange will not be changed — it is smaller than the maximum.
  • Given an entered custom daterange of 7/Mar/2009-31/Oct/2009, the entered daterange will be changed to 1/Mar/2009-31/Oct/2009 — the daterange is larger than the maximum allowed and the earliest date is not on a month boundary.
  • Given an entered custom daterange of 25/Mar/2009-2/Oct/2009, the entered daterange will be changed to 1/Mar/2009-31/Oct/2009 — the daterange is larger than the maximum allowed (note that the entered dates are moved to the start and end of a month respectively, not to the nearest start or end of a month).
  • Given an entered custom daterange of 1/Jan/2006-31/Dec/2009, the entered daterange will not be changed — it is already aligned to “month boundaries”.

Totals in the "clicks for query" and "queries for click" reports

Sometimes a "clicks for query" and "queries for click" report may show a total count of clicks/queries which is less than that shown on other reports. For example, when viewing the top queries report, the query "bananas" may have a total of 100 clicks for the time period specified. Clicking through to the "top clicks for query: bananas" report may show 3 top clicks:

  1. http://fruits.example.com/all_about_bananas.html with a total of 60 clicks
  2. http://fruits.example.com/bananas_nutrition_info.html with a total of 20 clicks
  3. http://fruits.example.com/potassium.html with a total of 10 clicks.

Summed together, the 3 clicks shown have a total of 90 clicks - 10 less than the number of clicks reported for the "banana" query.

This is due to the max facts per dimension combination setting. In order to improve scalability and performance, the query reporting system ignores data items (facts) that are outside a certain frequency threshold — for example, only the most popular 500 queries per day are stored by default. This also means that, by default, only the most popular 500 clicks for any query or queries for any click are stored. The "bananas" query above may have also received clicks on other results, but these clicks were not popular enough to pass the max facts per dimension combination threshold.

Query Reports Hardware Requirements

The table below gives minimum hardware requirements for processing various query log volumes.

Number of queries Minimum memory Minimum hard disk space
>= 20 million over 3 years 2.5GB 10GB
10 million 1.5GB 8GB
5 million 1GB 6GB
<= 1 million 500MB 4GB

When updating query reports for a collection with a large number of queries the max heap size collection setting should be increased.

Reporting Configuration Options

See also

top ⇑