make_report.pl
This feature is not available in the Squiz DXP. |
make_report.pl
processes a collection’s data files, producing data reports on their contents.
$ make_report.pl <--collection "collection config"> [--log] [--plain] [--datadir "data directory"] [--output "output directory"] [--hosts "host list file"]
Arguments
-
The collection configuration file must be specified, and must be a filesystem path to an existing, readable and valid collection configuration file.
-
--log
may also be specified, and indicates that the script should write to a log file. -
--plain
may also be specified, and indicates that the script should output plain HTML instead of Funnelback look and feel HTML. -
--datadir "data directory"
may also be specified, and gives the directory to provide reports for. -
--output "output directory"
may also be specified, and gives the directory to write output to. -
--hosts "host list file"
may also be specified, and gives the location of a file on a disk that groups sites / hosts into groups and subgroups.
Function
make_report.pl
runs over a data directory, recording statistics on the directories contents, and outputs reports to HTML files.
The directory that make_report.pl
runs over is the collection data folder ($SEARCH_HOME/data/$COLLECTION_NAME/offline/data
), or specified by the "--datadir" option.
make_report.pl
will place output in $SEARCH_HOME/admin/data_report/<collection>
by default, or in the directory specified by "--output".
If "--log" is specified, the script will write a log called crawl_data_report.log
to the log directory beside the specified data directory: eg, if the data directory is /opt/funnelback/data/<collection>/offline/data/
, the log file will be /opt/funnelback/data/<collection>/offline/data/crawl_data_report.log
, and if the data directory is /tmp/my_own_gathered_stuff/
, the log file will be /tmp/log/crawl_data_report.log
.
The reports produced will be plain HTML if "--plain" is specified. When this script is run by the update process, the files will include various substitutable strings, including: @ADMIN_HOME@
, @ADMIN_BASE@
and @REPORT_BASE@
. This is so that the search dashboard can read these files from disk and substitute in links to the search dashboard homepage, CSS files, images, etc.
A "hosts list" may be specified. If none is specified, a default of $SEARCH_HOME/conf/<collection>/sites-by-portfolio.csv
is assumed. The hosts list does not have to exist and has negligible impact on the reports. If present, the list should be of the format:
site,group,subgroup
For example:
http://forums.funnelback.com,businesses,funnelback http://www.funnelback.com,businesses,funnelback http://www.csiro.au,governmental,australia http://www.microsoft.com,businesses,microsoft http://www.australia.gov.au,government,australia http://www.health.gov.au,government,australia
Should the host list exist, various aggregate statistics will be produced. For example, statistics will not just be reported for individual sites, but for groups of sites.