Funnelback 8.0.0

Release notes for Funnelback 8.0.0

Released : 30th May 2008

Upgrade issues

  • Database collections have changed in layout, and now require an additional 'primary key' parameter. Please see the version 8 database collection upgrade guide for details.
  • Perl 5.8.8 is strongly recommended for all platforms:
    • Some features do not work out of the box under Perl 5.10 and Solaris.
    • Perl 5.8.5 and earlier have a bug in HTML::Entities, which may lead to incorrect encoding of apostrophes in the Funnelback system.
  • Queries are now logged in their expanded form, not their pre-expansion form.

New features

  • Document gathering from Microsoft Sharepoint and Lotus Domino
  • Faceted navigation
  • User tagging of results
  • User feedback on results
  • Basic Chinese / Japanese / Korean / Thai (CJKT) support
  • Feeds API
  • Crawling of content behind web forms
  • Automatically generated "support package"


  • Allow pre/post commands to use collection.cfg parameters
  • Broken link detection script for featured pages
  • Capability for fetching resources at query time for multiple collection types (databases, filecopy, TRIM)
  • Context sensitive help links open in new pages, not the current page
  • Display real-time collection update status on the admin UI home page
  • Import and export of featured pages and query expansions
  • Instant updates support filecopy collections
  • Instant update support for more collection types
  • Java is bundled with Funnelback
  • Logs for a collection go in a collection specific log dir, not the "system logs" dir
  • Log text on the "view file" page is more readable
  • Numerous improvements to form parsing (fixes for nested tags, res* tags that contain curly braces, etc)
  • Option to remove all data during uninstall
  • Reporting uses much less memory
  • Reports are viewable while they are generating, and a reporting error will no longer leave the reports unusable
  • Significantly improved database search, with "workflow" interface, incremental gathering and compressed storage
  • Support for extracting links from Javascript generated web pages
  • Updates for all collection types may now be halted (the halt may not occur until the end of the current update phase for some collection types)
  • When upgrading an installation, the license key is preserved

Selected bug fixes

  • Add support for filtering .dot (MS Word Template) files
  • Admin UI should include crawler.reject_files in its processing of the "file types to crawl" checkboxes
  • Allow collection parameter editing security model (parameter whitelists) to be applied on a per collection basis
  • Allow / ignore whitespace in various collection parameters
  • Ampersands in query* parameters are not parsed correctly
  • cache.cgi displays "XML parsing error" for pages in funnelback_documentation
  • cache.cgi does not perform security checks
  • cache.cgi links do not get properly URL encoded parameters
  • cache.cgi should strip meta refresh from its displayed contents to avoid sending users to incorrect locations
  • Cached XLS files don't display correctly in IE6
  • Can enter empty featured page and query expansion
  • Can't map the same xpath to multiple metadata classes
  • Change crawler to use MIME type rather than URL suffix when storing binary files
  • Check windows password is valid in installer
  • .ckpt index files should be removed by default
  • click.cgi links does not properly URL encode arguments
  • Clicking on filecopy results displays text in the error log
  • Click tracking not working by default
  • collection.cfg settings not being updated to point at new locations on an upgrade
  • Collection parameter whitelist not greying out fields
  • Collection summary rows should show successful update (green tick) after a successful index upgrade
  • Command line administration / Unix scheduling / Apache integration will not work if the Perl binary is not at /usr/bin/perl
  • Command line updates fail if not started from the bin directory
  • crawler_binaries parameter not being updated properly on an upgrade
  • Creating local collections with an unfindable source directory displays a confusing error message
  • _disabled__see_start_urls_file parameter being displayed in update log
  • Documentation CSS is indexed in the funnelback_documentation collection
  • Enable data reports for web collections on an upgrade
  • Filters not picking up title metadata from some Word docs
  • Fluster crashes when a query contains "(" or ")"
  • Fluster links have redundant CGI parameters
  • funnelback_documentation collection shouldn't be deletable from admin interface
  • Funnelback installer should complain if empty input is given for some fields
  • htpasswd_modify is not fixed in an upgrade
  • Improved handling of URL case sensitivity in the crawler
  • Incorrect handling of numeric entities in crawled URLs
  • Investigate fallback for external filters
  • Investigate how to make query expansion work with Fluster
  • java_libraries contain duplicated path after upgrade
  • Local collection url prefixes don't work as expected
  • Long logs are difficult to scroll
  • does not create start.urls file
  • Old Jetty HTTPS server not shut down during upgrade
  • Padre displays result counts in minresults mode
  • PADRE failing to parse XML with empty elements
  • Padre date sorts don't work for documents in the 16th / 17th century
  • Padre produces invalid XML for some documents that contain ampersands in their title
  • Padre segfault under rare combinations of gscopes and metadata searches
  • Parsing of meta parameters is broken
  • PDF not extracted correctly but output file with binary content was created
  • PDF results include shell error output
  • Permission errors under IIS
  • Remove trailing space in spelling suggestions
  • Reporting date routines do not handle leap years
  • Report links do not work under IIS
  • rss.cgi crashes when xsltproc is not found
  • RTF files filtered in trim collections do not have meaningful titles
  • Schedule updates page on windows incorrectly handles invalid input
  • Security violation displayed when empty filename is submitted for upload
  • Start URL parameter in instant update add doesn't check for a protocol
  • The "results can't be displayed because this collection has never updated" page looks awful
  • Various .cgi files do not have execute permission
  • Very rare hang caused by schtasks when upgrading from Funnelback 6.0.x to Funnelback 7.0.x
  • Viewing data reports forces the user displayed on the header to "admin"
  • Visual bugs when viewing administration under IIS
  • When editing a collection, changes are lost when navigating between tabs
  • Word expansion does not work with query_* parameters


