Funnelback 11.0.0 release notes

Released: 26 August 2011

New Features

  • Redeveloped query processing layer for more efficient query processing and improved search presentation customization.

  • New Push collection type for feeding non-web content into a funnelback index from a remote system over time, without the scalability limitations of instant updates.

  • New Directory collection type for searching Active Directory and LDAP repositories.

  • Administrator search tuning system allowing search ranking factors to be optimized for specific collections.

  • Content optimization system which provides detailed guidelines for content authors on how to improve a specific result’s ranking.

  • Preview and publish system for developing search form files without affecting production search presentation.

  • Ability to blend result sets for multiple queries from spelling suggestions, synonyms and other sources into a single result list.

  • Assorted web crawling improvements including support for revisiting infrequently changing content less often.

Upgrade Issues

  • Result summaries aren’t highlighted by default anymore so that form authors have complete control over the query highlighting. You’ll need to use the <s:boldicize /> tag on your existing forms to have the summaries highlighting back. - When upgrading trim collections from version 10, a full update of the collection is required to update the URLs of records to support the new instant update functionality.

  • The <s:boldicize /> and <s:italicize /> tags now use <strong> and `<em> ` HTML tags instead of `<b> ` and `<i> ` previously. If you were using these tags in your CSS stylesheet you’ll need to update it.

  • Using the Crawler form interaction system no longer disables cookie support by default. If a collection is using the form interaction system and can’t crawl password protected sites successfully after the upgrade, please explicitly disable cookie support by setting crawler.accept_cookies=false.

  • The default treatment of nepotistic links has been changed to limit their effect. This will reduce indexing time, and should have a positive effect on the ranking in most web collections, particularly large ones covering multiple domains. This change can be reverted by setting the -nep_action indexer option value to zero.

  • The isolated mode filter has been renamed IsolatedFilterProvider (Previously IsolatedPublishorFilterProvider) and is now able to use any filter classes.

    • It will use the Tika filter provider by default, so you’ll need to update your collection configurations if you want to continue using the Davisor filters in isolated mode.

  • The <s:Truncate> tag no longer supports the stripMiddle attribute.

  • The default behaviour for the web crawler is now to skip revisiting a proportion of infrequently changing pages during each crawl. This behaviour can be configured through the crawler revisit policy.

  • Data reports are now specific to web collections and are no longer available for other collection types.

Selected improvements and bugfixes

  • Increased permitted number of meta collection components.

  • Added ability to analyse URLs remaining in a web crawl frontier.

  • Support for gathering multiple Exchange mail boxes through the EntropySoft connector in a single collection.

  • Added ability for web crawler to read cookies from a file on startup.

  • Improved crawler form interaction cookie handling.

  • Improved handling of non UTF-8 web content.

  • Improved query highlighting in results, especially with UTF-8 characters.

  • Corrected handling of UTF-8 form files.

  • Support for collection profiles when tuning search quality.

  • Added ability to index HTTP header and Facebook Opengraph protocol metadata.

  • Fixed incorrect addition of collection name to C metadata by default.

  • Reworked query completion JavaScript to avoid conflicts with other JavaScript libraries.

  • Support for multiple facets per tag in freemarker templates.

  • Added distance from origin to XML output when searching geospatial data.

  • Reduced warning messages from result transforms on missing metadata.

  • Added support for resolving relative links within the IncludeURL form tag.

  • Better handling of special characters in indexer options.

  • Added spelling whitelist file for words which should be provided as spelling suggestions.

  • Changed boldicize tag to use HTML strong tags rather than bold tags.

  • Changed query processing ordering to apply spelling suggestions after synonym expansion.

  • Introduced ability to execute custom code during query processing.

  • Eliminated log files produced by inactive crawler threads.

  • Fixed incorrect permission settings on init.d scripts.

  • Improved layout and display of the Funnelback administration interface.

  • Fixed handling of column names with special characters during database gathering.

  • Added setup documentation for IIS 7.5.

  • Automated installation of 64bit versions of search indexing and query processing components.

  • Improved crawler tolerance for timeouts on seed pages.

  • Improved index 'warm up' scripts.

  • Fixed sorting of results when early binding security is used.

  • Added headers to CSV exports from the analytics dashboard.

  • Added support for instant updates on TRIM collections.

  • Improved Javascript link extraction logic to avoid some invalid link cases.

  • Improved ordering of collections in Funnelback’s administration interface.

  • Added tools for managing WARC archive files.

  • Fixed collection configuration cache clearing under mod_perl.