Funnelback 15.12.0 release notes

Released: 10 November 2017

15.12.0 - New features

  • Overhauled faceted navigation, greatly simplifying the data-model for rendering, and simplifying implementation of many previously complex scenarios.

    • Adds support for checkbox facets with results counts.

    • Adds support for tabbed presentation, radio button facets and mixed facet category sources.

    • Introduces a new configuration interface including a preview page and troubleshooting tips.

    • Adds support for facets based on queries, numeric ranges, and collections.

    • Substantially improved performance of metadata-based facet queries.

  • Expanded the range of checks performed by Accessibility Auditor, and improved the reporting interface.

    • A new reporting overview provides summaries of changes by WCAG levels.

    • Auditing of a single document now breaks down issues by WCAG principles.

    • Introduces additional reporting summaries on each WCAG technique and success criterion.

    • Enables filtering and csv exporting of the Accessibility Auditor reports.

  • Introduced option of x.509 client certificate authentication for search interface.

  • Introduced experimental support for SAML authentication in search and administration interfaces.

  • Introduced support for gathering content from HPE Content Manager 9.1.

  • Added an all results endpoint for streaming back all results of a query, even if the set is very large.

  • A non-expiring application token is now supported by most Funnelback APIs. See API Token Authentication.

  • Introduced new auto-completion plugin.

  • Introduced an advanced update option to reapply any reconfigured gscopes a collection’s live view.

15.12.0 - Selected improvements and bug fixes

  • Improved compression of indexes when push collection replication is used.

  • Increased the number/total-size of unique terms a search index may contain, and improved handling of very frequent terms.

  • Improved efficiency of classic administration interface with large numbers of collections.

  • Allowed for a number of query-time settings to be set within a service (profile.cfg) rather than only at the collection level.

  • Introduced ui.modern.padre_response_size_limit_bytes limit on padre response size to avoid large queries consuming all query processor memory.

  • Improved jetty request logging to limit access log size, and compress on rotation.

  • Fixed handling of multiple metadata items in external metadata when facet_item_sepchars is used.

  • Worked-around web servers returning content gzipped even if it is not requested.

  • Improved efficiency/reliability of groovy script change detection.

  • Improved push collection snapshot APIs and marking of incomplete/failed snapshots.

  • Improved fidelity of queries reported in analytics by eliminating unnecessary query simplifications.

  • Imposed a limit on JVM metaspace usage to ensure it is collected regularly.

  • Fixed form interaction to remove expired cookies and use defaults for form 'action' and 'method' parameters.

  • Increased the default values for max download size and max parsing size of file to 10MB.

  • Fixed an issue where the Modern UI cached view would not process all documents filtered to XML or left as XML (detected by Content-Type) as XML with the expected XSL transformation.

  • Custom collections, Database collections and Directory collections now support filtering. To enable filtering in existing custom collections a raw bytes store must be used see Custom Collections - Cache copies don’t work

  • Fixed a bug where the modern UI would return all profile configurations in the data model, rather than just the active one.

  • Gscopes are no longer referenced by bit number, instead they are given names like metadata classes. Where bit numbers are still used, they will now be interpreted as the gscope name (e.g. a gscope named '5').

  • Introduced support for auto-expansion of the number of available gscope bits.

15.12.0 - Upgrade Issues

  • Faceted navigation has been improved to support in built sorting, checkbox facets, tab facet and an easier to use facet data model. To take advantage of this you may need to upgrade your facets. To do it please follow this guide.

  • The faceted_navigation.date.sort_mode option is deprecated and will only work with legacy date facets. Built-in facet sorting can now be used instead.

  • Funnelback now includes an improved auto-completion system called Concierge. To upgrade existing collections please follow the upgrading to concierge guide.

  • Funnelback’s data-api has been removed. Admin-API provides equivalent calls, however please note that endpoints and JSON envelopes have changed.

    • /admin-api/collection-info/v1/collections/collection_name/url/data replaces /data-api/v1.0/urls/info

    • /admin-api/accessibility-auditor/v1/ replaces /data-api/v1.0/wcag/

    • /admin-api/predictive-segmentation/v1/ replaces /data-api/v1.0/predictive-segmentation

  • The deprecated features text-miner, classic wca-reporter, and classic analytics have been removed from the default installation.

  • Configuration file headers (containing the full path of the file) have been removed.

  • Jetty access log filenames have changed - Any systems reading them based on their old filenames may need to be updated.

  • Jetty access logs are now rotated daily or when they reach 512MB in size and kept for either 90 days or until the total size reaches 1.5GB.

  • URL fill facets behaviour has been updated so that facet values will be returned for every parent folder of the currently selected folder (in addition to returning the child folders of the current path) to be consistent with other facet types where the selected categories are present in the data model. For example when drilling down to folder1/folder2/folder3/ facets values will now be returned for folder1, folder2, folder3 all with their selected flag set to true. The default <@s.Category> macro has been updated accordingly and should result in no actual change in display. If custom FreeMarker facet macros are used they will need to be updated to pay attention to the new facet values.

  • Contextual Navigation has been updated so that no "site" cluster will be returned if there’s only 1 site in it. As a result response.resultPacket.contextualNavigation.categories may be empty if there are no "topics", no "types", and the "site" cluster contain only 1 site. This change is compatible with the default FreeMarker tags for Contextual Navigation, no change is required when using them. Custom FreeMarker Contextual Navigation tags may need to be updated accordingly.

  • The behaviour of search facet parameter facetScope has changed such that values set there no longer override parameters set on the URI e.g. facetScope=x%3Dfoo&x=bar results in x being set to both foo and bar rather than just foo.

  • The version of groovy included has been upgraded from 2.3.7 to 2.4.12.

  • Funnelback now bundles Bootstrap version 3.3.7, update the path to resources in FTL forms from ${GlobalResourcesPrefix}thirdparty/bootstrap-3.0.0/ to ${GlobalResourcesPrefix}thirdparty/bootstrap-3.3.7/. Note that v3.0.0 will be removed from Funnelback in a future release.

  • The version of java included with Funnelback has been updated to 8u141. Please note that some insecure SSL certificates will no longer be accepted by the new version.

  • The query processor option num_ranks has changed behaviour when set to 0. When set to zero it no longer skips query processing and instead will behave the same as positive values for num_ranks except no results will be displayed.

  • The type of gscopesSet within a Result returned by the modern UI has changed from a Set of Integer to a Set of String to reflect that gscopes are accessed by name rather than by bit number.

  • Collection.cfg option gscopes.other_bit_number has been renamed to gscopes.other_gscope, Funnelback remains compatible with the old key.

  • Database and Directory collections no longer use the XML store set by store.xml.class and instead store XML documents into a raw bytes store set by store.raw-bytes.class. Records are no longer stored using the primary key and instead are stored using the same URI that is set in the <funnelback_url> element, these collections no longer require that xml.cfg map the document URL. Existing Database and Directory collections require that the collection be updated before cache copies will work.

  • Raw bytes store com.funnelback.common.io.store.bytes.FlatFileStore has been fixed such that some URLs no longer cause issues with the store, as part of this the store is no longer compatible with previous versions. Any collections using this store should have a full update run and if possible switch to using com.funnelback.common.io.store.bytes.WarcFileStore.

  • The faceted_navigation.date.sort_mode option is deprecated and will only work with existing date facets, the build facet sorting can be used instead.

  • Faceted navigation has been improved to support in built sorting, checkbox facets, tab facet and an easier to use facet data model. To take advantage of this you may need to upgrade your facets. To do it please follow this guide.

  • To upgrade auto-completion to use new concierge auto-completion plugin please follow this guide.

Patches

Type Release version Description

3 Bug fixes

Upgrades log4j2 to version 2.16 to fix the security vulnerability where log4j2 JNDI features do not protect against attacker-controlled LDAP and other JNDI related endpoints.

3 Bug fixes

Fixes an issue where sessions are not terminated on logout events triggered by perl pages.

3 Bug fixes

Removes the screens for file-manager rule editing which could create security issues

3 Bug fixes

Fixes an issue where support packages could contain unintended files

3 Bug fixes

Fixes an issue where the running Funnelback jetty web server could retain permissions via supplemental groups after startup

3 Bug fixes

Limits an administration CGI script to redirect only within the Funnelback administration interface as intended

3 Bug fixes

Removes the unused administration debug.cgi script which reflected input parameters without proper escaping

3 Bug fixes

Prevent XSS AngularJS sandbox bypassing injection in Freemarker templates escaped using output formats by inserting zero-width whitespace between consecutive open-curly-brackets.

3 Bug fixes

Prevent XSS AngularJS sandbox bypassing injection in Freemarker templates by inserting zero-width whitespace between consecutive open-curly-brackets.

3 Bug fixes

Prevent XSS AngularJS sandbox bypassing injection in Freemarker templates by inserting zero-width whitespace between consecutive open-curly-brackets.

3 Bug fixes

Improve the performance of the Accessibility Auditor interface by requesting less data.

3 Bug fixes

Fixes an issue where some of the text on the Accessibility Auditor dashboard was showing out of date information.

3 Bug fixes

Improves the query response time when sorting.

3 Bug fixes

Fixes an issue where large (>2GB) index.dt files would cause padre-gs to fail when setting gscopes.

3 Bug fixes

Improves the Accessibility Auditor historical data storage. The data is stored in less space while also being significantly faster when storing and retrieving data. The Accessibility Auditor historical data APIs are also improved to reduce the amount of memory needed to help reduce the chance of 'OutOfMemoryError' exceptions from being thrown. The Accessibility Auditor historical data will be automatically moved to the new storage format when Jetty is restarted (one collection at a time) or on the first Accessibility Auditor historical data API request.

3 Bug fixes

The default timeout for 'push.scheduler.delay-between-meta-dependencies-runs' has been increased to '1200' (20 minutes). This has been increased to reduce the frequency at which Accessibility Auditor historical data is recorded. This option will need to be overridden if meta collections containing push collections need a smaller delay in updating the spelling index and auto completion.

3 Bug fixes

Prevents creation of objects within Freemarker template files to ensure that template editors can not cause external code to be executed.

3 Bug fixes

Fixes a bug where 'FineTune' may crash when 'query_processor_options' is longer than '1000' bytes.

3 Bug fixes

Push slaves will now actively pull down merge/vacuumed generations, rather than waiting for commits to trigger this. This can help solve problems where the slaves will not reduce the number of generations or re-indexes are not pulled down by the slaves.

3 Bug fixes

Fixes security issues where:

  • The default form-not-found template reflected the given form id without proper escaping.

  • The default configuration of URL previewing could be used to expose local log file content.

Please ensure any custom form-not-found.ftl templates in collections are updated to perform correct escaping if they were derived from the previously vulnerable form-not-found.default.ftl.

Please ensure that any customised value for the global default_url_renderer.permitted_url_pattern setting in global.cfg prevents access to file:// URLs.

3 Bug fixes

Improves the performance of the directory gatherer.

3 Bug fixes

Fixes support for sort mode '3' in query completion, allowing 'alpha' to be respected.

3 Bug fixes

parent_group Facebook events field has been removed since it requires escalated permissions. On some Facebook collections, this caused crawling of events to fail.

3 Bug fixes

Provides additional metadata for twitter records specifying if a tweet is a reply and to what it is a reply to. This is made available in the XML under 'isReply', 'inReplyToScreenName', 'inReplyToStatusId', 'inReplyToUserId' and 'inReplyToUrl'.

3 Bug fixes

Upgrades the version of our internal libraries to account for recent breaking changes in the Facebook Graph API. This will fix issues that caused Facebook collections to fail to update on certain user accounts, when crawling more than 200 posts in an hour, and when crawling events posted by a page. To update existing Facebook collections that may be failing, the changes noted in deployment instructions below will need to be made on each groovy script. best_page & parent_page Facebook page fields have been removed since they require escalated permissions.

3 Bug fixes

Fixes an issue where the web crawler parser would time out when parsing large (10MB+) HTML pages.

3 Bug fixes

Updates the search sessions click history to no longer record all metadata into the DB. Search sessions will only record the metadata classes listed in profile.cfg option 'ui.modern.session.search_history.metadata'. By default this is empty, but can be set with a comma separated list of wanted metadata classes for example:

ui.modern.session.search_history.metadata=a,b,c

3 Bug fixes

Fixes a bug where ratio to run full or incremental updates was not being applied and only a full update was triggered.

3 Bug fixes

Fixes a bug for scheduled updates where the 'schedule.incremental_crawl_ratio' parameter was not being respected.

3 Bug fixes

Fixes potential issues introduced by 15.12.0.12 and subsequent patches caused by an incorrect file being included in the patch.

3 Bug fixes

Fixes a bug in Accessibility Auditor which caused the document audit view to fail when a document contained escaped or unicode characters in their classnames.

3 Bug fixes

Fixes a potential indexer crash introduced in 15.12.0.14, and some additional cases where multiple dots could be shown in summaries.

3 Bug fixes

Fixes query biased summaries so that it doesn’t show multiple dots when the original content contains non breaking spaces as the only value within "p" tags.

3 Bug fixes

Increases the maximum query length to 1MB and maximum query nodes to 16384 on Linux only.

3 Bug fixes

Fixes a bug where analytics would skip query logs when the query was run with a gscope that was not all numbers.

3 Bug fixes

Fixes a bug where query processing would not complete if the query contained an isolated colon in it.

3 Bug fixes

Fixes a bug where query processing would not complete if the query contained "%" in it when search sessions are enabled.

3 Bug fixes

Fixes a bug in the "JSONToXML" filter which would produce odd XML when a JSON key was set to "content" e.g. {"content": {…​}}.

3 Bug fixes

Fixes a bug where the Accessibility Auditor overview would fail to display correctly when a certain combination of updates were run in a meta collection.

3 Bug fixes

Cleans up the display of the Accessibility Auditor pages when a site has no failures or all of its failures have been acknowledged.

3 Bug fixes

Fixes a bug where the Admin API was passing the comment to the publish hook as multiple arguments where it should have been passing the comment as a single argument.

3 Bug fixes

Upgrades the twitter library to add support for the longer, 280 character tweets. For this to be used, the ConfigurationBuilder object needs to be updated to call "setTweetModeExtended(true)". With the default twitter groovy gather script, this can be done by adding "cb.setTweetModeExtended(true);" immediately after the creation of the new ConfigurationBuilder.

3 Bug fixes

Fixes a "gscope opstack underflow" error when named gscopes from facets and a gscope1 parameter are combined.

In particular, this could occur when using the automatically generated URL scope gscopes in a facet, and then clicking the 'more' link on a contextual navigation list. Named gscopes are now combined correctly to avoid failing in this case, and the redundant gcope1 parameter in contextual navigation links has been removed.

3 Bug fixes

Fixes an issue which caused the @fb.ExtraSearch Freemarker macro to not return any results.

3 Bug fixes

Prevents Pattern Analyser from failing when reporting-blacklist.cfg queries contain quotes.

3 Bug fixes

Pattern analyser will overwrite rather than append to its log.

3 Bug fixes

Changes the Modern UI sessions such that they no longer use J2EE sessions and always uses the cookie that was set by ui.modern.session.set_userid_cookie. That option is now removed and a cookie is always set only when sessions are enabled. This reduces disk and CPU load.

3 Bug fixes

To support backwards compatibility with some existing implementations, create facets for zero count gscopes.

3 Bug fixes

Fixed an issue where the user editing interface for a user with no permitted collections would be presented with all collections selected, rather than none.

3 Bug fixes

Fixes a bug where the classic administration dashboard would not be accessible to non locally authenticated users (e.g. ldap) that had a large user .ini file.

3 Bug fixes

Fixes the metamap.cfg documentation page to display the code blocks correctly.

3 Bug fixes

Changes the click tracking endpoint to no longer depend on the referrer. This does result in the click logs no longer containing the referrer URL.

3 Bug fixes

Adds ARIA14 to the Accessibility Auditor and relaxes the requirement for what is considered descriptive text.

3 Bug fixes

Fixes an issue where analytics might fail to update.

3 Bug fixes

Allow groovy servlet filters to abort processing in preFilterResponse by returning null.

3 Bug fixes

Fixes passing Success Criteria being displayed in the Accessibility Auditor when auditing an url.

3 Bug fixes

Adds better support for the gScopesCount map when used with Integer keys rather than the expected String type keys. 15.12 changed the type of this map to use String keys rather than Integer keys.

3 Bug fixes

Removes selectUrl and unselectUrl from the faceted navigation data model as it is not required, toggelUrl or the current URL can be used instead.