Funnelback logo

Documentation

Pattern Analyser reports

Introduction

The pattern analyser reporting system provides information about queries which have had a large increase in their query volume over a short space of time i.e. a "spike" in activity for that query.

Pattern Analyser Dashboard

The pattern analyser dashboard provides details of the most prominent detected queries for a given time period. The pattern analyser dashboard page is available from the analytics tab on the administration home page.

Spike-dashboard-link.png

The pattern analyser dashboard presents a list of the most significantly increased queries detected by Funnelback Analytics during the selected time period. In the example below, we have chosen to view entries for a single day (via the upper navigation pane), and selected Monday the 23rd of November with the Next / Previous day links.

For this example collection, pattern analyser flagged only one query on Monday the 23rd of November.

Spike-dashboard.png

The query column displays the query which was detected (in this example "employees"), followed by a list of the other queries identified as most strongly related to this query. Note that the list of related queries can be manually manipulated with the related.cfg file.

The shape column displays a sparkline showing the rate of occurrences of this query over the five days before and after the detection of the query spike. Clicking on this sparkline shows the full query time-plot as described below.

The confidence column indicates how significant pattern analyser considers this query trend to be relative to the historical information available.

The peak column identifies the date (within the 10 days around the query's detection) on which most instances of the detected query were received.

The increase column provides a percentage measure of the query volume increase for the past seven days vs the preceding seven days (or 24 hours in the case of queries detected for a single hour).

The user locations column lists the locations from which the query most frequently arrived, based on the requesting IP address.

The export options to the right of the dashboard allow both a PDF and CSV export of the reported data for printing or further processing, and the tab links above the dashboard allow viewing of the queries detected with the highest confidence on any given day in the selected period.

Query Volume Chart

Clicking on the sparkline presented in the dashboard above provides access to the chart of the query volume and the two most related queries over the selected time period (or the two weeks either side of the detected outlier for short time periods).

The chart is interactive and which will display the date and individual query values for each day as the mouse moves over the chart (as shown above the chart in the screen-shot below).

Timeplot.png

As with the dashboard, navigation through time can be performed with the upper navigation pane and the chart can be exported in CSV, XML and PDF format.

The chart also provides two other facilities for exploring the query volume data:

  • A rolling average can be used to smooth out noise in the data by changing the value in the bottom left of the chart to the number of days which should be averaged.
  • The chart can be zoomed into a specific area by dragging a selection over part of the chart. Once zoomed in, the zoom can be reset by double clicking within the chart.

Please note that in a standard Funnelback installation, pattern analysis is performed more frequently than the process which updates the time plot charts. This means that after a query is detected, it may take up to 24 hours for the time plot chart to display the queries which created the spike.

Email Alerting

Funnelback can be configured to send alert emails every time a query is detected by pattern analyser to allow real-time action to be taken as required.

To configure email alerts for pattern analyser, click the "Edit Analytics Email Settings" link in the collection's Analyse tab.

Spike-email-link.png

  • The sender email address should contain a single email address and will be used as the From address for Analytics emails.
  • The email addresses field can contain a comma separated list of email addresses.
  • You can enable the emailing of pattern analyser alerts by specifying that they should be emailed out "when detected".
  • This form also allows you to specify how often a PDF summary of the main query reports (top queries) should be emailed out.

Spike-email-settings.png

Please note that Funnelback must be configured with a valid SMTP server during installation for email to be sent successfully. SMTP settings can be adjusted in the global.cfg file in install_dir/conf/global.cfg if required.

Updating Pattern Analyser Reports

Pattern analyser reports are automatically updated every hour by a scheduled task, and do not require any manual updating or configuration, however the pattern analyser reports will only be generated for collections for which query reports have been updated.

Eliminating Noise

Some search services receive significant numbers of automated or spam queries which may be detected, but are not of interest. Such queries can be eliminated from consideration through the collection's reporting-blacklist.cfg file.

Excluding Collections

The analytics.outlier.exclude_collection setting can be set to true to disable pattern analyser entirely for a collection.

Query Log Naming and Rotation

When operating Funnelback in a multi-server configuration, some care must be taken to ensure query logs are available to the pattern analyser system. For performance reasons pattern analyser requires that archived log files be identified with a date stamp in the file name (for example queries.log.20090902.gz), as these date stamps are used to restrict the logs required for pattern analysis.

Standard practice within a multiple server set-up would be to transfer all query log files to a server responsible for analytics, retaining the date stamp in the file name and adding a hostname to ensure log names are unique.

Please also note that pattern analyser may fail to detect some queries when processing historical data for collections which have been updated less frequently than once per month. This issue can be rectified by manually splitting any query log files spanning more than a day into individual logs with date stamps.

See also

top ⇑