make_report.pl

This feature is not available in the Squiz DXP.

make_report.pl processes a collection’s data files, producing data reports on their contents.

$ make_report.pl <--collection "collection config"> [--log] [--plain] [--datadir "data directory"] [--output "output directory"] [--hosts "host list file"]
Table of Contents

Arguments

  • The collection configuration file must be specified, and must be a filesystem path to an existing, readable and valid collection configuration file.

  • --log may also be specified, and indicates that the script should write to a log file.

  • --plain may also be specified, and indicates that the script should output plain HTML instead of Funnelback look and feel HTML.

  • --datadir "data directory" may also be specified, and gives the directory to provide reports for.

  • --output "output directory" may also be specified, and gives the directory to write output to.

  • --hosts "host list file" may also be specified, and gives the location of a file on a disk that groups sites / hosts into groups and subgroups.

Function

make_report.pl runs over a data directory, recording statistics on the directories contents, and outputs reports to HTML files.

The directory that make_report.pl runs over is the collection data folder ($SEARCH_HOME/data/$COLLECTION_NAME/offline/data), or specified by the "--datadir" option.

make_report.pl will place output in $SEARCH_HOME/admin/data_report/<collection> by default, or in the directory specified by "--output".

If "--log" is specified, the script will write a log called crawl_data_report.log to the log directory beside the specified data directory: eg, if the data directory is /opt/funnelback/data/<collection>/offline/data/, the log file will be /opt/funnelback/data/<collection>/offline/data/crawl_data_report.log, and if the data directory is /tmp/my_own_gathered_stuff/, the log file will be /tmp/log/crawl_data_report.log.

The reports produced will be plain HTML if "--plain" is specified. When this script is run by the update process, the files will include various substitutable strings, including: @ADMIN_HOME@, @ADMIN_BASE@ and @REPORT_BASE@. This is so that the search dashboard can read these files from disk and substitute in links to the search dashboard homepage, CSS files, images, etc.

A "hosts list" may be specified. If none is specified, a default of $SEARCH_HOME/conf/<collection>/sites-by-portfolio.csv is assumed. The hosts list does not have to exist and has negligible impact on the reports. If present, the list should be of the format:

site,group,subgroup

For example:

http://forums.funnelback.com,businesses,funnelback
http://www.funnelback.com,businesses,funnelback
http://www.csiro.au,governmental,australia
http://www.microsoft.com,businesses,microsoft
http://www.australia.gov.au,government,australia
http://www.health.gov.au,government,australia

Should the host list exist, various aggregate statistics will be produced. For example, statistics will not just be reported for individual sites, but for groups of sites.

See also