Funnelback 15.16.0 release notes

Released: 7 August 2018

Supported until: 7 August 2019 (Short Term Support Version)

15.16.0 - New features

  • Funnelback licenses are now assigned per-collection rather than per-server, allowing multiple licenses to be used on a single server.

  • Long running tasks such as collection updates and search analytics processing are now submitted to a task-queue which can be customized to delay new tasks when the Funnelback server is under heavy load.

  • Introduced support for searching Slack messages via the new Slackpush collection type.

  • Introduced dedicated collection types for Facebook, Flickr, Twitter and Youtube, removing the need to create custom collections for these types.

  • Bulk CSV import/export of best bets, allowing for offline editing in a spreadsheet.

15.16.0 - Selected improvements and bug fixes

  • Added management screens for crawler site profiles.

  • Added metadata selection dropdown options within faceted navigation configuration.

  • Added facet selection dropdown options within curator configuration.

  • Introduced 'listMetadata' in the search result data model, which provides pre-separated values for each metadata class based on the defined separator characters.

  • Added ability to access requestHeaders via the searchQuestion data model.

  • Improved performance of push API when using the multi-part endpoints.

  • Introduced daemon.max_heap_size, jetty.max_heap_size and jetty.max_metaspace_size global cfg options to persist memory adjustments between upgrades.

  • URL facets have been improved so that it works better in cases where the URL contained non indexable characters, the URL path contain repeated path names e.g. /foo/foo/foo as well as some fixes to case sensitivity.

  • Introduced crawler.send-http-basic-credentials-without-challenge setting (on by default) to match old crawler behavior of sending http basic credentials without an initial 401 challenge.

  • Jetty has been upgraded to 9.4.11.v20180605 and the multi-part parser has been changed to a RFC7578 compliant parser which is stricter than the previous multipart parser. The multi-part parser is faster which is especially useful for the push API.

  • Jetty now uses the conscrypt SSL library which results in jetty using more secure and faster SSL ciphers. Java clients to Funnelbacks APIs should switch to using conscrypt to take advantage of the faster encryption, otherwise your client will likely be slower than it was. The push API client which uses funnelback-api-client-core.jar can be upgraded to use conscrypt by getting a copy of $SEARCH_HOME/lib/java/all/funnelback-api-client-core.jar.

  • Multiple changes have been made to the Push API to improve its performance.

  • Improved support for binary file filtering (Apache Tika upgraded to 1.18).

  • Upgraded embedded version of Java runtime, which now includes the Java Cryptography Extension. Previous versions required manual installation in some SAML use cases.

  • Analytics updates now supports multiple collections updating at the same time.

  • Analytics update pre_reporting_command and post_reporting_command is now run with the collection reporting lock held, which means while they are running another analytics job will not be able to run.

  • Fixed reading of server.cpu_count global.cfg option, as some places were using the key cpu_count which would result in the default value of auto being used.

  • Removed complexity check which prevented contextual navigation running in some cases.

  • Added ability to re-apply gscopes on local collections.

15.16.0 - Configuration Upgrade Steps

The following changes will be automatically performed on all configurations during the upgrade process. Configurations migrated from older versions after the upgrade will need to have update-configs.pl manually run to apply these changes.

  • Users with access to the old cp.license.key permissions will be granted the new sec.license.view-usage, sec.license.can-edit-other-users-licenses, sec.license.install and sec.license.delete permissions.

  • Users with access to the relevant files in the file manager are now granted the following new permissions sec.spelling, sec.url-kill-list, sec.reporting-exclusion, sec.server-alias, sec.site-profile.

15.16.0 - Upgrade Issues

  • Search request IP addresses are now pseudonymized by default - See ui.modern.pseudonymise_client_ips to disable this if needed.

  • As Funnelback now supports multiple licenses per installation some APIs are no longer possible and have been removed.

    • GET /admin-api/license/v1/usage API has been removed and replaced with GET /admin-api/license/v2/document-usage-per-license, which returns usage for all licenses the user has permission to use as well as all licenses that are used in collections the user has access to. This new API, like the old, respects sec.license.view-usage.

    • GET /admin-api/license/v1/details API has been removed and replaced with GET /admin-api/manage-licenses/v1/licenses, which returns all details for all licenses the user has permission to use.

  • The default timeout for contextual navigation has been reduced from 5 seconds to 1 second. Collections relying on the old default may need to set the timeout value.

  • Added support for Facebook Graph API version 3.1 by upgrading the RestFB library from version 1.42.0 to 2.8.0

15.16.0 - Errata

  • Facebook APIs are currently undergoing major reviews and changes which are affecting the ability of newly created Facebook application IDs to access the posts and events commonly presented in search results by Funnelback. Further updates or guidance will be issued to address these issues when Facebook makes it possible for Funnelback to do so.

Patches

Type Release version Description

3 Bug fixes

Upgrades log4j2 to version 2.16 to fix the security vulnerability where log4j2 JNDI features do not protect against attacker-controlled LDAP and other JNDI related endpoints.

3 Bug fixes

Removes the screens for file-manager rule editing which could create security issues

3 Bug fixes

Fixes an issue where support packages could contain unintended files

3 Bug fixes

Fixes an issue where the running Funnelback jetty web server could retain permissions via supplemental groups after startup

3 Bug fixes

Limits an administration CGI script to redirect only within the Funnelback administration interface as intended

3 Bug fixes

Removes the unused administration debug.cgi script which reflected input parameters without proper escaping

3 Bug fixes

Improve the performance of the Accessibility Auditor interface by requesting less data.

3 Bug fixes

Fixes an issue where some of the text on the Accessibility Auditor dashboard was showing out of date information.

3 Bug fixes

Fixes an issue where the Accessibility Auditor dashboard would not generate the thumbnail screenshots for each domain.

3 Bug fixes

Improves the query response time when sorting.

3 Bug fixes

Fixes an issue where large (>2GB) index.dt files would cause padre-gs to fail when setting gscopes.

3 Bug fixes

Fixes an issue where jetty would terminate on invalid 'index.autoc' (query completion) files.

3 Bug fixes

Fixes an issue that prevents scheduled tasks from appearing in the Administration interface on Windows Server 2016.

3 Bug fixes

Fixes an issue where recording Accessibility Auditor details would fail during the swap views step if the server is in read-only mode.

3 Bug fixes

Fixes an issue where swap-views.pl did not clear the redis state before running the pipeline.

3 Bug fixes

Improves the Accessibility Auditor historical data storage. The data is stored in less space while also being significantly faster when storing and retrieving data. The Accessibility Auditor historical data APIs are also improved to reduce the amount of memory needed to help reduce the chance of 'OutOfMemoryError' exceptions from being thrown. The Accessibility Auditor historical data will be automatically moved to the new storage format when Jetty is restarted (one collection at a time) or on the first Accessibility Auditor historical data API request.

3 Bug fixes

The default timeout for 'push.scheduler.delay-between-meta-dependencies-runs' has been increased to '1200' (20 minutes). This has been increased to reduce the frequency at which Accessibility Auditor historical data is recorded. This option will need to be overridden if meta collections containing push collections need a smaller delay in updating the spelling index and auto completion.

3 Bug fixes

Fixes a bug where queries may not return when instant updates include URLs that contain ampersands.

3 Bug fixes

Prevents creation of objects within Freemarker template files to ensure that template editors can not cause external code to be executed.

3 Bug fixes

Fixes a bug where 'FineTune' may crash when 'query_processor_options' is longer than '1000' bytes.

3 Bug fixes

Push slaves will now actively pull down merge/vacuumed generations, rather than waiting for commits to trigger this. This can help solve problems where the slaves will not reduce the number of generations or re-indexes are not pulled down by the slaves.

3 Bug fixes

Fixes security issues where:

  • The default form-not-found template reflected the given form id without proper escaping.

  • The default configuration of URL previewing could be used to expose local log file content.

Please ensure any custom form-not-found.ftl templates in collections are updated to perform correct escaping if they were derived from the previously vulnerable form-not-found.default.ftl.

Please ensure that any customised value for the global default_url_renderer.permitted_url_pattern setting in global.cfg prevents access to file:// URLs.

3 Bug fixes

Fixes a bug where the crawler would not correctly decode links in HTML, XML and plain text documents.

3 Bug fixes

Improves the performance of the directory gatherer.

3 Bug fixes

Fixes support for sort mode '3' in query completion, allowing 'alpha' to be respected.

3 Bug fixes

parent_group Facebook events field has been removed since it requires escalated permissions. On some Facebook collections, this caused crawling of events to fail.

3 Bug fixes

Provides additional metadata for twitter records specifying if a tweet is a reply and to what it is a reply to. This is made available in the XML under 'isReply', 'inReplyToScreenName', 'inReplyToStatusId', 'inReplyToUserId' and 'inReplyToUrl'.

3 Bug fixes

Adds an option to padre-iw to allow control of how the lock string should be modified. The option is "-lock_string_mod_mode=[legacy raw]". By default it is set to "legacy" and that keeps the current behaviour of modifying the lockstring. The "raw" option results in padre-iw not altering the lock string; leaving in white space, new lines and commas.

3 Bug fixes

Adds an experimental DLS plugin called "secBoolExpr" which is able to evaluate lockstrings which are boolean expression. For example, a lockstring where the user must have both B and C or A can be represented as A,(B AND C) or A | (B&C). This lockstring is evaluated as a boolean expression where the user keys are what is true in the expression. For example, if the user had "B" and "C" then they would be able to view that document. To enable the plugin set in collection.cfg "security.earlybinding.locks-keys-matcher.name=secBoolExpr" and set "-lock_string_mod_mode=raw" on the indexer.

3 Bug fixes

The default Facebook crawler requested information that required the page to be created by the same user that provided the application access token. The best_page & parent_page page fields are no longer available and the crawler will now function without escalated permissions.

3 Bug fixes

Fixes an issue with SAML authentication where users who did not have access to the funnelback_documentation collection were unable to log out.

3 Bug fixes

Fixes an issue with SAML authentication where some browsers would fail to redirect users completely after login.

3 Bug fixes

Updates error messaging in the task queue when SAML authentication is enabled.

3 Bug fixes

Updates product documentation references to unreleased features.

3 Bug fixes

Stops the same sitemap.xml file from being processed multiple times on different threads.