Funnelback 11.0.0 release notes
Released: 26 August 2011
New Features
-
Redeveloped query processing layer for more efficient query processing and improved search presentation customization.
-
New Push collection type for feeding non-web content into a funnelback index from a remote system over time, without the scalability limitations of instant updates.
-
New Directory collection type for searching Active Directory and LDAP repositories.
-
Administrator search tuning system allowing search ranking factors to be optimized for specific collections.
-
Content optimization system which provides detailed guidelines for content authors on how to improve a specific result’s ranking.
-
Preview and publish system for developing search form files without affecting production search presentation.
-
Ability to blend result sets for multiple queries from spelling suggestions, synonyms and other sources into a single result list.
-
Assorted web crawling improvements including support for revisiting infrequently changing content less often.
Upgrade Issues
-
Result summaries aren’t highlighted by default anymore so that form authors have complete control over the query highlighting. You’ll need to use the
<s:boldicize />
tag on your existing forms to have the summaries highlighting back. - When upgrading trim collections from version 10, a full update of the collection is required to update the URLs of records to support the new instant update functionality. -
The
<s:boldicize />
and<s:italicize />
tags now use<strong>
and `<em> ` HTML tags instead of `<b> ` and `<i> ` previously. If you were using these tags in your CSS stylesheet you’ll need to update it. -
Using the Crawler form interaction system no longer disables cookie support by default. If a collection is using the form interaction system and can’t crawl password protected sites successfully after the upgrade, please explicitly disable cookie support by setting
crawler.accept_cookies=false
. -
The default treatment of nepotistic links has been changed to limit their effect. This will reduce indexing time, and should have a positive effect on the ranking in most web collections, particularly large ones covering multiple domains. This change can be reverted by setting the -nep_action indexer option value to zero.
-
The isolated mode filter has been renamed
IsolatedFilterProvider
(PreviouslyIsolatedPublishorFilterProvider
) and is now able to use any filter classes.-
It will use the Tika filter provider by default, so you’ll need to update your collection configurations if you want to continue using the Davisor filters in isolated mode.
-
-
The
<s:Truncate>
tag no longer supports thestripMiddle
attribute. -
The default behaviour for the web crawler is now to skip revisiting a proportion of infrequently changing pages during each crawl. This behaviour can be configured through the crawler revisit policy.
-
Data reports are now specific to web collections and are no longer available for other collection types.
Selected improvements and bugfixes
-
Increased permitted number of meta collection components.
-
Added ability to analyse URLs remaining in a web crawl frontier.
-
Support for gathering multiple Exchange mail boxes through the EntropySoft connector in a single collection.
-
Added ability for web crawler to read cookies from a file on startup.
-
Improved crawler form interaction cookie handling.
-
Improved handling of non UTF-8 web content.
-
Improved query highlighting in results, especially with UTF-8 characters.
-
Corrected handling of UTF-8 form files.
-
Support for collection profiles when tuning search quality.
-
Added ability to index HTTP header and Facebook Opengraph protocol metadata.
-
Fixed incorrect addition of collection name to C metadata by default.
-
Reworked query completion JavaScript to avoid conflicts with other JavaScript libraries.
-
Support for multiple facets per tag in freemarker templates.
-
Added distance from origin to XML output when searching geospatial data.
-
Reduced warning messages from result transforms on missing metadata.
-
Added support for resolving relative links within the IncludeURL form tag.
-
Better handling of special characters in indexer options.
-
Added spelling whitelist file for words which should be provided as spelling suggestions.
-
Changed boldicize tag to use HTML strong tags rather than bold tags.
-
Changed query processing ordering to apply spelling suggestions after synonym expansion.
-
Introduced ability to execute custom code during query processing.
-
Eliminated log files produced by inactive crawler threads.
-
Fixed incorrect permission settings on init.d scripts.
-
Improved layout and display of the Funnelback administration interface.
-
Fixed handling of column names with special characters during database gathering.
-
Added setup documentation for IIS 7.5.
-
Automated installation of 64bit versions of search indexing and query processing components.
-
Improved crawler tolerance for timeouts on seed pages.
-
Improved index 'warm up' scripts.
-
Fixed sorting of results when early binding security is used.
-
Added headers to CSV exports from the analytics dashboard.
-
Added support for instant updates on TRIM collections.
-
Improved Javascript link extraction logic to avoid some invalid link cases.
-
Improved ordering of collections in Funnelback’s administration interface.
-
Added tools for managing WARC archive files.
-
Fixed collection configuration cache clearing under mod_perl.