Logs reference

Background

This article describes most of the log files produced by Funnelback. The information is correct for Funnelback 15.20.

The following abbreviations are used in this document:

  • <CN> - Collection id

  • <PN> - Profile id

  • <V> - Collection view (live or offline)

The location path for all logs is relative to $SEARCH_HOME (Linux) or %SEARCH_HOME% (Windows)

Collection logs - all collection types

User interface logs

modernui.Admin.log

  • Location: data/<CN>/log

  • Purpose: Linux only. Contains messages logged by the admin search endpoints. e.g. template error logging

modernui.Public.log

  • Location: data/<CN>/log

  • Purpose: Linux only. Contains messages logged by the public search endpoints e.g. template error logging

modernui.<CN>.Admin.log

  • Location: web/logs

  • Purpose: Windows only. Contains messages logged by the admin search endpoints. e.g. template error logging

modernui.<CN>.Public.log

  • Location: web/logs

  • Purpose: Windows only. Contains messages logged by the public search endpoints e.g. template error logging

Collection update logs

<CN>.lock

  • Location: data/<CN>/log

  • Purpose: These files are used to prevent multiple updates running simulatiously (by taking an OS lock on the file). They’re generally empty and hence contain no useful info.

<CN>.pre-update.log

  • Location: data/<CN>/log

  • Purpose: Logs the java command that was run for the update.

knowledge_graph.<PN>.log

  • Location: data/<CN>/log

  • Purpose: Logs messages from the update of knowledge graph for a specific profile.

update-<CN>.log

  • Location: data/<CN>/log

  • Purpose: Logs the output from update.pl. Logs messages for the collection update process.

update-<CN>.previous.log

  • Location: data/<CN>/log

  • Purpose: Messages from the previous collection update.

pattern_analyser.log

  • Location: data/<CN>/log

  • Purpose: Logs the output from outliers-log-processing.pl. Logs messages for the pattern analyser reports build.

update_reports_launch.log

  • Location: data/<CN>/log

  • Purpose: Logs messages for the analytics report build.

update_reports.log

  • Location: data/<CN>/log

  • Purpose: Logs the output from reports-load-queries.pl. Logs messages for the analytics reports build.

update_reports_previous.log

  • Location: data/<CN>/log

  • Purpose: Messages from the previous reports build.

update.log

  • Location: data/<CN>/<V>/log

  • Purpose: Top level log for the update pipeline.

Indexer logs

Step-AnnieAPrimaryCollection.log

  • Location: data/<CN>/<V>/log

  • Purpose: Logs the output from annie-a. Logging for the build of the annotation index.

Step-BuildAutoCompletion.<PN>.log

  • Location: data/<CN>/<V>/log

  • Purpose: Logs the output from build_autoc. Logging for each auto-completion index. An auto-completion index is built for each profile.

Step-BuildAutoCompletion.log

  • Location: data/<CN>/<V>/log

  • Purpose: Logs the output from build_autoc. Logging for the collection auto-completion index build.

Step-BuildCollapsingSignatures.log

  • Location: data/<CN>/<V>/log

  • Purpose: Logs the output from padre-cc. Logging for the collapsing index build process.

Step-BuildSpelling.log

  • Location: data/<CN>/<V>/log

  • Purpose: Logs the output from build_spelling_index. Logging for the spelling index build process. A spelling index is built for the collection and for each profile.

Step-Index.log

  • Location: data/<CN>/<V>/log

  • Purpose: Logs the output from padre-iw. Logging for the collection’s index build process.

Step-QueryIndependentEvidenceCollectionLevel.log

  • Location: data/<CN>/<V>/log

  • Purpose: Logs the output from padre-qi. Logging for the query independent evidence index build process.

Step-SetGscopes.log

  • Location: data/<CN>/<V>/log

  • Purpose: Logs the output from padre-gs. Logging for application of collection level gscopes.

Step-FacetBasedGscopes.log

  • Location: data/<CN>/<V>/log

  • Purpose: Logs output for setting up query facets.

Step-MoveTmpIndexIntoPlace.log

  • Location: data/<CN>/<V>/log

  • Purpose: Logs output for index move step.

Collection update logs - web and matrix collections

Gather (web crawler) logs

binaries.log

  • Location: data/<CN>/<V>/log

  • Purpose: List of binary files stored e.g. (PDF, DOC etc.) Appended to on restart from checkpoint

copied_urls.log

  • Location: data/<CN>/<V>/log

  • Purpose: Log output if crawler.incremental_logging=true. All URLs whose content was copied from the previous crawl, as they had not changed and so were not downloaded again.

crawler.central.log

  • Location: data/<CN>/<V>/log

  • Purpose: Contains filter messages.

crawler_logs_checkpoint_sizes.dat

  • Location: data/<CN>/<V>/log

  • Purpose: Records the sizes of crawler logs at the time a checkpoint occurred. This allows truncating them back to that size if the crawler is restarted form a checkpoint to avoid inconsistency.

crawl.log

  • Location: data/<CN>/<V>/log

  • Purpose: Top level log for the web crawler.

crawl.log.<N>

  • Location: data/<CN>/<V>/log

  • Purpose: Logs individual messages for each running crawler thread. One log per thread.

domains.log

  • Location: data/<CN>/<V>/log

  • Purpose: Frontier and stored document counts for each domain encountered during the crawl.

frontier_dump.log

  • Location: data/<CN>/<V>/log

  • Purpose: Contains a dump of the crawl frontier (set of known, uncrawled URLs).

headers.log

  • Location: data/<CN>/<V>/log

  • Purpose: Captures HTTPS headers recorded during the crawl. Output if crawler.header_logging=true

manifest.txt

  • Location: data/<CN>/<V>/log

  • Purpose: Records the order in which bundles were created

monitor.log

  • Location: data/<CN>/<V>/log

  • Purpose: Records various crawl statistics.

new_urls.log

  • Location: data/<CN>/<V>/log

  • Purpose: A new URL is defined as one which was not stored in the previous crawl. Log output if crawler.incremental_logging=true

redirects.txt

  • Location: data/<CN>/<V>/log

  • Purpose: Appended to on restart from checkpoint. Captures redirects.

servers.log

  • Location: data/<CN>/<V>/log

  • Purpose: Frontier and stored document counts for each server encountered during the crawl.

stored.log

  • Location: data/<CN>/<V>/log

  • Purpose: All URLs stored during a crawl (also used for refresh updates), in chronological order. Appended to on restart from checkpoint.

store-messages.log

  • Location: data/<CN>/<V>/log

  • Purpose: Records URLs stored into a WARC/Mirror store as well as edit distance calculation logs and certain error/warning states of MirrorStore.

url_errors.log

  • Location: data/<CN>/<V>/log

  • Purpose: Records errors while processing URLs.

url_no_content.log

  • Location: data/<CN>/<V>/log

  • Purpose: Contains URLs which are stored despite having no content. Documents with no content are usually the result of a filter returning an empty document. This occurs when crawler.store_empty_content_urls=true.

url_titles.log

  • Location: data/<CN>/<V>/log

  • Purpose: Lists URLs and their titles.

BroadMIMETypeStatistic.stat

  • Location: data/<CN>/<V>/log

  • Purpose: The file types by MIME report displays statistics on document types as reported by the web server. A significant difference between the document types reported here and the document types reported by the types by suffix report may indicate a webserver serving documents with an incorrect content type.

BroadWebServerTypeStatistic.stat

  • Location: data/<CN>/<V>/log

  • Purpose: Records web-server types in general categories (anything after a / or ( is truncated) - e.g. Will capture Apache without the version.

CrawlSizeStatistic.stat

  • Location: data/<CN>/<V>/log

  • Purpose: Records the total URLs stored by a web crawl

FileSizeByDocumentTypeStatistic.stat

  • Location: data/<CN>/<V>/log

  • Purpose: The file sizes by document type report displays statistics on file sizes found, divided by content type.

FileSizeStatistic.stat

  • Location: data/<CN>/<V>/log

  • Purpose: The file sizes report displays statistics on content sizes found.

MIMETypeStatistic.stat

  • Location: data/<CN>/<V>/log

  • Purpose: The file types by MIME report displays statistics on document types as reported by the web server. A significant difference between the document types reported here and the document types reported by the types by suffix report may indicate a webserver serving documents with an incorrect content type.

ReferencedFileTypeStatistic.stat

  • Location: data/<CN>/<V>/log

  • Purpose: Records the file extensions of URLs seen in href/src HTML attributes even if those URLs would not be crawled

SuffixTypeStatistic.stat

  • Location: data/<CN>/<V>/log

  • Purpose: The file types by suffix report displays statistics on document types as identified by checking of the suffix.

URLlengthStatistic.stat

  • Location: data/<CN>/<V>/log

  • Purpose: Records the number of URL of different lengths seen during a crawl

WebServerTypeStatistic.stat

  • Location: data/<CN>/<V>/log

  • Purpose: Records statistics the web server types (including versions etc.) seen during the crawl

Collection update logs - database collections

Gather logs

dbgather.log

  • Location: data/<CN>/<V>/log

  • Purpose: Logs messages from the database gatherer.

gather_executable.log

  • Location: data/<CN>/<V>/log

  • Purpose: Wrapper log for the gather process

Collection update logs - directory collections

Gather logs

directory_gather.log

  • Location: data/<CN>/<V>/log

  • Purpose: Logs messages from the directory gatherer.

gather_executable.log

  • Location: data/<CN>/<V>/log

  • Purpose: Wrapper log for the gather process

Collection update logs - filecopy collections

Gather logs

gather_executable.log

  • Location: data/<CN>/<V>/log

  • Purpose: Wrapper log for the gather process

filecopier.log

  • Location: data/<CN>/<V>/log

  • Purpose: Logs messages from filecopier collection updates.

stored.log

  • Location: data/<CN>/<V>/log

  • Purpose: Lists the documents stored by the filecopier

monitor.log

  • Location: data/<CN>/<V>/log

  • Purpose: Records various statistics about the filecopier update.

url_errors.log

  • Location: data/<CN>/<V>/log

  • Purpose: Records errors while processing URLs.

Collection update logs - custom collections

Gather logs

gather_executable.log

  • Location: data/<CN>/<V>/log

  • Purpose: Logs output from the custom gatherer

Collection update logs - trimpush collections

Gather logs

trim.log

  • Location: data/<CN>/<V>/log

  • Purpose: Logs output from the trim gatherer

trim-details.log

  • Location: data/<CN>/<V>/log

  • Purpose: Logs output from the trim gatherer

trim-combine-attachments.log

  • Location: data/<CN>/<V>/log

  • Purpose: Logs messages from the combine attachments process

filter.log

  • Location: data/<CN>/<V>/log

  • Purpose: Logs filter messages from the trim update (filtering via the funnelback daemon filter service)

monitor.log

  • Location: data/<CN>/<V>/log

  • Purpose: Provides various statistics about the trim gather process

error.log

  • Location: data/<CN>/<V>/log

  • Purpose: Logs TRIM IDs of documents that recorded a failure

Collection update logs - slackpush collections

Gather logs

gather_executable.log

  • Location: data/<CN>/<V>/log

  • Purpose: Logs output from the slackpush gatherer

Collection update logs - facebook collections

Gather logs

gather_executable.log

  • Location: data/<CN>/<V>/log

  • Purpose: Wrapper log for the gather process

social_media.log

  • Location: data/<CN>/<V>/log

  • Purpose: Contains detail for the Facebook gather process

Collection update logs - flickr collections

Gather logs

gather_executable.log

  • Location: data/<CN>/<V>/log

  • Purpose: Wrapper log for the gather process

social_media.log

  • Location: data/<CN>/<V>/log

  • Purpose: Contains detail for the Flickr gather process

Collection update logs - twitter collections

Gather logs

gather_executable.log

  • Location: data/<CN>/<V>/log

  • Purpose: Wrapper log for the gather process

social_media.log

  • Location: data/<CN>/<V>/log

  • Purpose: Contains detail for the Twitter gather process

Collection update logs - youtube collections

Gather logs

gather_executable.log

  • Location: data/<CN>/<V>/log

  • Purpose: Wrapper log for the gather process

social_media.log

  • Location: data/<CN>/<V>/log

  • Purpose: Contains detail for the YouTube gather process

Application logs

Installation logs

funnelback-install.ilog

  • Location: log

  • Purpose: Output from the Funnelback installation process.

post-install-<V>.log

  • Location: log

  • Purpose: Output from the post installer script.

post_install.sh-<>.log

  • Location: log

  • Purpose: Output from the post_install.sh script

upgrade-<V>.log

  • Location: log

  • Purpose: Output from the post_install.pl script

Collection management

create.log

  • Location: log

  • Purpose: Logs the output of creating a collection from the administration interface

delete.log

  • Location: log

  • Purpose: Logs the output of deleting a collection from the administration interface

public-ui.warnings

  • Location: log

  • Purpose: Contains administration user interface warning messages. When this log file exists a banner is presented in the adminstration interface.

scheduled_tasks_backup.txt

  • Location: log

  • Purpose: Records the last change made to scheduled tasks on Unix based OSes (basically a copy of the cron job’s before-and-after states)

update-launch.log

  • Location: log

  • Purpose: Logs any errors that may occur when starting an update.

update_reports_errors.log

  • Location: log

  • Purpose: Contains errors from the main report update process.

Service logs

daemon.log

  • Location: log

  • Purpose: Contains messages recorded by the Funnelback daemon.

daemon-wrapper.log

  • Location: log

  • Purpose: Contains messages from the YAJSW wrapper for the Funnelback daemon service.

jetty.log

  • Location: log

  • Purpose: Contains messages recorded by the Jetty web server.

jetty-launch.log

  • Location: log

  • Purpose: Contains messages from the YAJSW wrapper for the Jetty service.

graph-wrapper.log

  • Location: log

  • Purpose: Contains messages from the YAJSW wrapper for the Funnelback graph service.

neo4j/debug.log

  • Location: log

  • Purpose: Contains log messages from Neo4j.

redis.log

  • Location: log

  • Purpose: Contains messages recorded by the Redis service.

redis-wrapper.log

  • Location: log

  • Purpose: Contains messages from the YAJSW wrapper for the Redis service.

mail.log

  • Location: log

  • Purpose: Logs emails sent out by Funnelback.

Web server logs

Request logs

<YYYY_MM_DD>.error.log

  • Location: web/logs

  • Purpose: Contains errors from the Jetty web server

access.admin.log

  • Location: web/logs

  • Purpose: Jetty access logs for the administration context. Logs are rotated and compressed.

access.public.log

  • Location: web/logs

  • Purpose: Jetty access logs for the public context. Logs are rotated and compressed.

Audit logs

audit-admin-api.log

  • Location: web/logs

  • Purpose: Log of users who access the admin API (/admin-api) along with the response code the receive.

audit.classic-admin.log

  • Location: web/logs

  • Purpose: Log of users who access the classic-admin (/search/admin) along with the response code the receive.

audit.classic-admin.log.<N>.gz

  • Location: web/logs

  • Purpose: These are older logs of the above rotated away and compressed.

admin-api.log

  • Location: web/logs

  • Purpose: Log from the admin-api web application (/admin-api)

classic-admin.log

  • Location: web/logs

  • Purpose: Log from the classic-admin web application (which takes requests to jetty and passes them to the perl CGIs in classic-admin)

authentication.classic-admin.log

  • Location: web/logs

  • Purpose: Record of successful and failed authentication attempts against the classic-admin (/search/admin) (includes remote IP address and X-ForwardedFor headers)

authentication.modernui.log

  • Location: web/logs

  • Purpose: Record of successful and failed authentication attempts against the modernui (/s) (includes remote IP address and X-ForwardedFor headers)

authentication.mediator-endpoint.log

  • Location: web/logs

  • Purpose: Record of successful and failed authentication attempts against the mediator (/search/admin/mediator) (includes remote IP address and X-ForwardedFor headers)

authentication.push.log

  • Location: web/logs

  • Purpose: Record of successful and failed authentication attempts against the push (/push-api) (includes remote IP address and X-ForwardedFor headers)

mediator-endpoint-http.log

  • Location: web/logs

  • Purpose: Log from the mediator web application (which contains a number of APIs from before admin-api existed)

push.log

  • Location: web/logs

  • Purpose: Log of the push API.

User interface logs

modernui.Admin.log

  • Location: web/logs

  • Purpose: Log from the modernui web application (/s/) deployed on the admin jetty server (port 8443 by default)

modernui.Public.log

  • Location: web/logs

  • Purpose: Log from the modernui web application (/s/) deployed on the public/search jetty server (port 80 and 443 by default)

Knowledge graph logs

kg/spring-boot.log

  • Location: web/logs

  • Purpose: Logs messages from the startup of knowledge graph.

kg/cortex-api.log

  • Location: web/logs

  • Purpose: Logs messages from the knowledge graph API.

kg/cortex-neo4j.log

  • Location: web/logs

  • Purpose: Contains the Neo4j queries executed by Funnelback.