Funnelback 8.0.0 release notes

Released: 30th May 2008

Upgrade issues

  • Database collections have changed in layout, and now require an additional 'primary key' parameter. Please see the version 8 database collection upgrade guide for details.

  • Perl 5.8.8 is strongly recommended for all platforms:

    • Some features do not work out of the box under Perl 5.10 and Solaris.

    • Perl 5.8.5 and earlier have a bug in HTML::Entities, which may lead to incorrect encoding of apostrophes in the Funnelback system.

  • Queries are now logged in their expanded form, not their pre-expansion form.

New features

  • Document gathering from Microsoft Sharepoint and Lotus Domino

  • Faceted navigation

  • User tagging of results

  • User feedback on results

  • Basic Chinese / Japanese / Korean / Thai (CJKT) support

  • Feeds API

  • Crawling of content behind web forms

  • Automatically generated "support package"

Improvements

  • Allow pre/post commands to use collection.cfg parameters

  • Broken link detection script for featured pages

  • Capability for fetching resources at query time for multiple collection types (databases, filecopy, TRIM)

  • Context sensitive help links open in new pages, not the current page

  • Display real-time collection update status on the administration dashboard home page

  • Import and export of featured pages and query expansions

  • Instant updates support filecopy collections

  • Instant update support for more collection types

  • Java is bundled with Funnelback

  • Logs for a collection go in a collection specific log dir, not the "system logs" dir

  • Log text on the "view file" page is more readable

  • Numerous improvements to form parsing (fixes for nested tags, res* tags that contain curly braces, etc)

  • Option to remove all data during uninstall

  • Reporting uses much less memory

  • Reports are viewable while they are generating, and a reporting error will no longer leave the reports unusable

  • Significantly improved database search, with "workflow" interface, incremental gathering and compressed storage

  • Support for extracting links from Javascript generated web pages

  • Updates for all collection types may now be halted (the halt may not occur until the end of the current update phase for some collection types)

  • When upgrading an installation, the license key is preserved

Selected bug fixes

  • Add support for filtering .dot (MS Word Template) files

  • administration dashboard should include crawler.reject_files in its processing of the "file types to crawl" checkboxes

  • Allow collection parameter editing security model (parameter whitelists) to be applied on a per collection basis

  • Allow / ignore whitespace in various collection parameters

  • Ampersands in query* parameters are not parsed correctly

  • cache.cgi displays "XML parsing error" for pages in funnelback_documentation

  • cache.cgi does not perform security checks

  • cache.cgi links do not get properly URL encoded parameters

  • cache.cgi should strip meta refresh from its displayed contents to avoid sending users to incorrect locations

  • Cached XLS files don’t display correctly in IE6

  • Can enter empty featured page and query expansion

  • Can’t map the same xpath to multiple metadata classes

  • Change crawler to use MIME type rather than URL suffix when storing binary files

  • Check windows password is valid in installer

  • .ckpt index files should be removed by default

  • click.cgi links does not properly URL encode arguments

  • Clicking on filecopy results displays text in the error log

  • Click tracking not working by default

  • collection.cfg settings not being updated to point at new locations on an upgrade

  • Collection parameter whitelist not greying out fields

  • Collection summary rows should show successful update (green tick) after a successful index upgrade

  • Command line administration / Unix scheduling / Apache integration will not work if the Perl binary is not at /usr/bin/perl

  • Command line updates fail if not started from the bin directory

  • crawler_binaries parameter not being updated properly on an upgrade

  • Creating local collections with an unfindable source directory displays a confusing error message

  • _disabled__see_start_urls_file parameter being displayed in update log

  • Documentation CSS is indexed in the funnelback_documentation collection

  • Enable data reports for web collections on an upgrade

  • Filters not picking up title metadata from some Word docs

  • Fluster crashes when a query contains "(" or ")"

  • Fluster links have redundant CGI parameters

  • funnelback_documentation collection shouldn’t be deletable from admin interface

  • Funnelback installer should complain if empty input is given for some fields

  • htpasswd_modify is not fixed in an upgrade

  • Improved handling of URL case sensitivity in the crawler

  • Incorrect handling of numeric entities in crawled URLs

  • Investigate fallback for external filters

  • Investigate how to make query expansion work with Fluster

  • java_libraries contain duplicated path after upgrade

  • Local collection url prefixes don’t work as expected

  • Long logs are difficult to scroll

  • new-collection.pl does not create start.urls file

  • Old Jetty HTTPS server not shut down during upgrade

  • Padre displays result counts in minresults mode

  • PADRE failing to parse XML with empty elements

  • Padre date sorts don’t work for documents in the 16th / 17th century

  • Padre produces invalid XML for some documents that contain ampersands in their title

  • Padre segfault under rare combinations of gscopes and metadata searches

  • Parsing of meta parameters is broken

  • PDF not extracted correctly but output file with binary content was created

  • PDF results include shell error output

  • Permission errors under IIS

  • Remove trailing space in spelling suggestions

  • Reporting date routines do not handle leap years

  • Report links do not work under IIS

  • rss.cgi crashes when xsltproc is not found

  • RTF files filtered in trim collections do not have meaningful titles

  • Schedule updates page on windows incorrectly handles invalid input

  • Security violation displayed when empty filename is submitted for upload

  • Start URL parameter in instant update add doesn’t check for a protocol

  • The "results can’t be displayed because this collection has never updated" page looks awful

  • Various .cgi files do not have execute permission

  • Very rare hang caused by schtasks when upgrading from Funnelback 6.0.x to Funnelback 7.0.x

  • Viewing data reports forces the user displayed on the header to "admin"

  • Visual bugs when viewing administration under IIS

  • When editing a collection, changes are lost when navigating between tabs

  • Word expansion does not work with query_* parameters