Logs reference
Background
This article describes most of the log files produced by Funnelback. The information is correct for Funnelback 15.20.
The following abbreviations are used in this document:
- 
<CN>- Collection id
- 
<PN>- Profile id
- 
<V>- Collection view (liveoroffline)
The location path for all logs is relative to $SEARCH_HOME (Linux) or %SEARCH_HOME% (Windows)
Collection logs - all collection types
User interface logs
modernui.Admin.log
- 
Location: data/<CN>/log
- 
Purpose: Linux only. Contains messages logged by the admin search endpoints. e.g. template error logging 
modernui.Public.log
- 
Location: data/<CN>/log
- 
Purpose: Linux only. Contains messages logged by the public search endpoints e.g. template error logging 
modernui.<CN>.Admin.log
- 
Location: web/logs
- 
Purpose: Windows only. Contains messages logged by the admin search endpoints. e.g. template error logging 
modernui.<CN>.Public.log
- 
Location: web/logs
- 
Purpose: Windows only. Contains messages logged by the public search endpoints e.g. template error logging 
Collection update logs
<CN>.lock
- 
Location: data/<CN>/log
- 
Purpose: These files are used to prevent multiple updates running simulatiously (by taking an OS lock on the file). They’re generally empty and hence contain no useful info. 
<CN>.pre-update.log
- 
Location: data/<CN>/log
- 
Purpose: Logs the java command that was run for the update. 
knowledge_graph.<PN>.log
- 
Location: data/<CN>/log
- 
Purpose: Logs messages from the update of knowledge graph for a specific profile. 
update-<CN>.log
- 
Location: data/<CN>/log
- 
Purpose: Logs the output from update.pl. Logs messages for the collection update process. 
update-<CN>.previous.log
- 
Location: data/<CN>/log
- 
Purpose: Messages from the previous collection update. 
pattern_analyser.log
- 
Location: data/<CN>/log
- 
Purpose: Logs the output from outliers-log-processing.pl. Logs messages for the pattern analyser reports build. 
update_reports_launch.log
- 
Location: data/<CN>/log
- 
Purpose: Logs messages for the analytics report build. 
update_reports.log
- 
Location: data/<CN>/log
- 
Purpose: Logs the output from reports-load-queries.pl. Logs messages for the analytics reports build. 
update_reports_previous.log
- 
Location: data/<CN>/log
- 
Purpose: Messages from the previous reports build. 
update.log
- 
Location: data/<CN>/<V>/log
- 
Purpose: Top level log for the update pipeline. 
Indexer logs
Step-AnnieAPrimaryCollection.log
- 
Location: data/<CN>/<V>/log
- 
Purpose: Logs the output from annie-a. Logging for the build of the annotation index.
Step-BuildAutoCompletion.<PN>.log
- 
Location: data/<CN>/<V>/log
- 
Purpose: Logs the output from build_autoc. Logging for each auto-completion index. An auto-completion index is built for each profile.
Step-BuildAutoCompletion.log
- 
Location: data/<CN>/<V>/log
- 
Purpose: Logs the output from build_autoc. Logging for the collection auto-completion index build.
Step-BuildCollapsingSignatures.log
- 
Location: data/<CN>/<V>/log
- 
Purpose: Logs the output from padre-cc. Logging for the collapsing index build process.
Step-BuildSpelling.log
- 
Location: data/<CN>/<V>/log
- 
Purpose: Logs the output from build_spelling_index. Logging for the spelling index build process. A spelling index is built for the collection and for each profile.
Step-Index.log
- 
Location: data/<CN>/<V>/log
- 
Purpose: Logs the output from padre-iw. Logging for the collection’s index build process.
Step-QueryIndependentEvidenceCollectionLevel.log
- 
Location: data/<CN>/<V>/log
- 
Purpose: Logs the output from padre-qi. Logging for the query independent evidence index build process.
Step-SetGscopes.log
- 
Location: data/<CN>/<V>/log
- 
Purpose: Logs the output from padre-gs. Logging for application of collection level gscopes.
Step-FacetBasedGscopes.log
- 
Location: data/<CN>/<V>/log
- 
Purpose: Logs output for setting up query facets. 
Step-MoveTmpIndexIntoPlace.log
- 
Location: data/<CN>/<V>/log
- 
Purpose: Logs output for index move step. 
Collection update logs - web and matrix collections
Gather (web crawler) logs
binaries.log
- 
Location: data/<CN>/<V>/log
- 
Purpose: List of binary files stored e.g. (PDF, DOC etc.) Appended to on restart from checkpoint 
copied_urls.log
- 
Location: data/<CN>/<V>/log
- 
Purpose: Log output if crawler.incremental_logging=true. All URLs whose content was copied from the previous crawl, as they had not changed and so were not downloaded again.
crawler.central.log
- 
Location: data/<CN>/<V>/log
- 
Purpose: Contains filter messages. 
crawler_logs_checkpoint_sizes.dat
- 
Location: data/<CN>/<V>/log
- 
Purpose: Records the sizes of crawler logs at the time a checkpoint occurred. This allows truncating them back to that size if the crawler is restarted form a checkpoint to avoid inconsistency. 
crawl.log
- 
Location: data/<CN>/<V>/log
- 
Purpose: Top level log for the web crawler. 
crawl.log.<N>
- 
Location: data/<CN>/<V>/log
- 
Purpose: Logs individual messages for each running crawler thread. One log per thread. 
domains.log
- 
Location: data/<CN>/<V>/log
- 
Purpose: Frontier and stored document counts for each domain encountered during the crawl. 
frontier_dump.log
- 
Location: data/<CN>/<V>/log
- 
Purpose: Contains a dump of the crawl frontier (set of known, uncrawled URLs). 
headers.log
- 
Location: data/<CN>/<V>/log
- 
Purpose: Captures HTTPS headers recorded during the crawl. Output if crawler.header_logging=true
manifest.txt
- 
Location: data/<CN>/<V>/log
- 
Purpose: Records the order in which bundles were created 
monitor.log
- 
Location: data/<CN>/<V>/log
- 
Purpose: Records various crawl statistics. 
new_urls.log
- 
Location: data/<CN>/<V>/log
- 
Purpose: A new URL is defined as one which was not stored in the previous crawl. Log output if crawler.incremental_logging=true
redirects.txt
- 
Location: data/<CN>/<V>/log
- 
Purpose: Appended to on restart from checkpoint. Captures redirects. 
servers.log
- 
Location: data/<CN>/<V>/log
- 
Purpose: Frontier and stored document counts for each server encountered during the crawl. 
stored.log
- 
Location: data/<CN>/<V>/log
- 
Purpose: All URLs stored during a crawl (also used for refresh updates), in chronological order. Appended to on restart from checkpoint. 
store-messages.log
- 
Location: data/<CN>/<V>/log
- 
Purpose: Records URLs stored into a WARC/Mirror store as well as edit distance calculation logs and certain error/warning states of MirrorStore. 
url_errors.log
- 
Location: data/<CN>/<V>/log
- 
Purpose: Records errors while processing URLs. 
url_no_content.log
- 
Location: data/<CN>/<V>/log
- 
Purpose: Contains URLs which are stored despite having no content. Documents with no content are usually the result of a filter returning an empty document. This occurs when crawler.store_empty_content_urls=true.
url_titles.log
- 
Location: data/<CN>/<V>/log
- 
Purpose: Lists URLs and their titles. 
BroadMIMETypeStatistic.stat
- 
Location: data/<CN>/<V>/log
- 
Purpose: The file types by MIME report displays statistics on document types as reported by the web server. A significant difference between the document types reported here and the document types reported by the types by suffix report may indicate a webserver serving documents with an incorrect content type. 
BroadWebServerTypeStatistic.stat
- 
Location: data/<CN>/<V>/log
- 
Purpose: Records web-server types in general categories (anything after a / or ( is truncated) - e.g. Will capture Apache without the version. 
CrawlSizeStatistic.stat
- 
Location: data/<CN>/<V>/log
- 
Purpose: Records the total URLs stored by a web crawl 
FileSizeByDocumentTypeStatistic.stat
- 
Location: data/<CN>/<V>/log
- 
Purpose: The file sizes by document type report displays statistics on file sizes found, divided by content type. 
FileSizeStatistic.stat
- 
Location: data/<CN>/<V>/log
- 
Purpose: The file sizes report displays statistics on content sizes found. 
MIMETypeStatistic.stat
- 
Location: data/<CN>/<V>/log
- 
Purpose: The file types by MIME report displays statistics on document types as reported by the web server. A significant difference between the document types reported here and the document types reported by the types by suffix report may indicate a webserver serving documents with an incorrect content type. 
ReferencedFileTypeStatistic.stat
- 
Location: data/<CN>/<V>/log
- 
Purpose: Records the file extensions of URLs seen in href/src HTML attributes even if those URLs would not be crawled 
SuffixTypeStatistic.stat
- 
Location: data/<CN>/<V>/log
- 
Purpose: The file types by suffix report displays statistics on document types as identified by checking of the suffix. 
URLlengthStatistic.stat
- 
Location: data/<CN>/<V>/log
- 
Purpose: Records the number of URL of different lengths seen during a crawl 
WebServerTypeStatistic.stat
- 
Location: data/<CN>/<V>/log
- 
Purpose: Records statistics the web server types (including versions etc.) seen during the crawl 
Collection update logs - filecopy collections
Gather logs
gather_executable.log
- 
Location: data/<CN>/<V>/log
- 
Purpose: Wrapper log for the gather process 
filecopier.log
- 
Location: data/<CN>/<V>/log
- 
Purpose: Logs messages from filecopier collection updates. 
stored.log
- 
Location: data/<CN>/<V>/log
- 
Purpose: Lists the documents stored by the filecopier 
monitor.log
- 
Location: data/<CN>/<V>/log
- 
Purpose: Records various statistics about the filecopier update. 
url_errors.log
- 
Location: data/<CN>/<V>/log
- 
Purpose: Records errors while processing URLs. 
Collection update logs - trimpush collections
Gather logs
trim.log
- 
Location: data/<CN>/<V>/log
- 
Purpose: Logs output from the trim gatherer 
trim-details.log
- 
Location: data/<CN>/<V>/log
- 
Purpose: Logs output from the trim gatherer 
trim-combine-attachments.log
- 
Location: data/<CN>/<V>/log
- 
Purpose: Logs messages from the combine attachments process 
filter.log
- 
Location: data/<CN>/<V>/log
- 
Purpose: Logs filter messages from the trim update (filtering via the funnelback daemon filter service) 
monitor.log
- 
Location: data/<CN>/<V>/log
- 
Purpose: Provides various statistics about the trim gather process 
error.log
- 
Location: data/<CN>/<V>/log
- 
Purpose: Logs TRIM IDs of documents that recorded a failure 
Application logs
Installation logs
funnelback-install.ilog
- 
Location: log
- 
Purpose: Output from the Funnelback installation process. 
post-install-<V>.log
- 
Location: log
- 
Purpose: Output from the post installer script. 
post_install.sh-<>.log
- 
Location: log
- 
Purpose: Output from the post_install.shscript
upgrade-<V>.log
- 
Location: log
- 
Purpose: Output from the post_install.plscript
Collection management
create.log
- 
Location: log
- 
Purpose: Logs the output of creating a collection from the administration interface 
delete.log
- 
Location: log
- 
Purpose: Logs the output of deleting a collection from the administration interface 
public-ui.warnings
- 
Location: log
- 
Purpose: Contains administration user interface warning messages. When this log file exists a banner is presented in the adminstration interface. 
scheduled_tasks_backup.txt
- 
Location: log
- 
Purpose: Records the last change made to scheduled tasks on Unix based OSes (basically a copy of the cron job’s before-and-after states) 
update-launch.log
- 
Location: log
- 
Purpose: Logs any errors that may occur when starting an update. 
update_reports_errors.log
- 
Location: log
- 
Purpose: Contains errors from the main report update process. 
Service logs
daemon.log
- 
Location: log
- 
Purpose: Contains messages recorded by the Funnelback daemon. 
daemon-wrapper.log
- 
Location: log
- 
Purpose: Contains messages from the YAJSW wrapper for the Funnelback daemon service. 
jetty.log
- 
Location: log
- 
Purpose: Contains messages recorded by the Jetty web server. 
jetty-launch.log
- 
Location: log
- 
Purpose: Contains messages from the YAJSW wrapper for the Jetty service. 
graph-wrapper.log
- 
Location: log
- 
Purpose: Contains messages from the YAJSW wrapper for the Funnelback graph service. 
neo4j/debug.log
- 
Location: log
- 
Purpose: Contains log messages from Neo4j. 
redis.log
- 
Location: log
- 
Purpose: Contains messages recorded by the Redis service. 
redis-wrapper.log
- 
Location: log
- 
Purpose: Contains messages from the YAJSW wrapper for the Redis service. 
mail.log
- 
Location: log
- 
Purpose: Logs emails sent out by Funnelback. 
Web server logs
Request logs
<YYYY_MM_DD>.error.log
- 
Location: web/logs
- 
Purpose: Contains errors from the Jetty web server 
access.admin.log
- 
Location: web/logs
- 
Purpose: Jetty access logs for the administration context. Logs are rotated and compressed. 
access.public.log
- 
Location: web/logs
- 
Purpose: Jetty access logs for the public context. Logs are rotated and compressed. 
Audit logs
audit-admin-api.log
- 
Location: web/logs
- 
Purpose: Log of users who access the admin API ( /admin-api) along with the response code the receive.
audit.classic-admin.log
- 
Location: web/logs
- 
Purpose: Log of users who access the classic-admin ( /search/admin) along with the response code the receive.
audit.classic-admin.log.<N>.gz
- 
Location: web/logs
- 
Purpose: These are older logs of the above rotated away and compressed. 
admin-api.log
- 
Location: web/logs
- 
Purpose: Log from the admin-api web application ( /admin-api)
classic-admin.log
- 
Location: web/logs
- 
Purpose: Log from the classic-admin web application (which takes requests to jetty and passes them to the perl CGIs in classic-admin) 
authentication.classic-admin.log
- 
Location: web/logs
- 
Purpose: Record of successful and failed authentication attempts against the classic-admin ( /search/admin) (includes remote IP address and X-ForwardedFor headers)
authentication.modernui.log
- 
Location: web/logs
- 
Purpose: Record of successful and failed authentication attempts against the modernui ( /s) (includes remote IP address and X-ForwardedFor headers)
authentication.mediator-endpoint.log
- 
Location: web/logs
- 
Purpose: Record of successful and failed authentication attempts against the mediator ( /search/admin/mediator) (includes remote IP address and X-ForwardedFor headers)
authentication.push.log
- 
Location: web/logs
- 
Purpose: Record of successful and failed authentication attempts against the push ( /push-api) (includes remote IP address and X-ForwardedFor headers)
mediator-endpoint-http.log
- 
Location: web/logs
- 
Purpose: Log from the mediator web application (which contains a number of APIs from before admin-api existed) 
push.log
- 
Location: web/logs
- 
Purpose: Log of the push API. 
User interface logs
modernui.Admin.log
- 
Location: web/logs
- 
Purpose: Log from the modernui web application ( /s/) deployed on the admin jetty server (port 8443 by default)
modernui.Public.log
- 
Location: web/logs
- 
Purpose: Log from the modernui web application ( /s/) deployed on the public/search jetty server (port 80 and 443 by default)
Knowledge graph logs
kg/spring-boot.log
- 
Location: web/logs
- 
Purpose: Logs messages from the startup of knowledge graph. 
kg/cortex-api.log
- 
Location: web/logs
- 
Purpose: Logs messages from the knowledge graph API. 
kg/cortex-neo4j.log
- 
Location: web/logs
- 
Purpose: Contains the Neo4j queries executed by Funnelback.