Funnelback 7.0.0 release notes

Released: 26th September, 2007

Upgrade issues

  • Previous versions of Funnelback enforced the creation of a search user - This user may now be selected during the installation.

  • Funnelback’s installation no longer attempts to configure local web servers (Apache and IIS) to serve Funnelback. An embedded web server is installed, and instructions for configuring Apache and IIS are included in the Funnelback installing and upgrading guide.

Key new features and improvements

  • Intuitive graphical installation

  • A bundled web server for easy setup

  • Fluster result clustering

  • Update scheduling interface for Windows

  • Date based reporting

  • Instant updates

  • Word 2007 text extraction

  • Form security options for advanced configurations

  • Improved TRIM support

  • Improved filecopy support

  • Improved document text extraction

  • Improved administration user interface

  • Improved default search interface

New features

  • Anchors.cgi analysis tool

  • Options to control server duplicate detection

  • New files created from the administration dashboard can now be based on a template

  • Webcrawler support for specifying preferred site names

  • Ability to crawl links within HTML comments

  • TRIM live link serving script

  • Use regular expressions to exclude parts of page from indexing

  • Multiple start URLs

  • Support for specifying a cookie to use during a web crawl

  • Integrated crawl data statistics reports

Improvements

  • Meta-collection dictionary checking

  • Integration of search term highlighting with padre thesaurus expansion

  • Quiet mode for xml-splitter.pl

  • Selection of IIS web site to configure for Funnelback

  • The 'clive' parameter used for dynamic meta collections uses collection names

  • Links in file-copy collections can be prefixed to make them valid

  • Better report formatting

  • Protection from padre-fl killing all documents

  • Cache.cgi checks the offline view if the required file is not in the live view

  • Stopped swizzling .vec files

  • Display padre help on STDOUT rather than STDERR

  • Remove window_size and table_size collection.cfg options

  • Add timestamps to url_errors.log file in the Webcrawler

  • Click tracking editable from the administration dashboard

  • Support a search over the local installed Funnelback documentation

  • Change default setting for crawler.check_case_sensitivity to false

  • Update collection automatically after collection creation

  • Check all/uncheck all option for manually building report database

  • Add robots.txt file to prevent crawling of administration dashboard

  • Crawl XML pages by default

  • Avoid overloading server with a lot of virtual hosts

Selected bug fixes

  • Installation fixed on Windows machine with no hostname

  • Null query didn’t respect kill bits

  • Drop in crawler throughput on multi-site collection

  • Max files stored was confusing when restarting from a checkpoint

  • Spell.cgi only checked query

  • Scheduler validation case problem resolved

  • Padre no longer not assumes utf-8 input

  • Spell.cgi’s pos values were incorrect when multiple terms have the same spelling suggestion

  • UTF-8 characters could be mangled in queries

  • Featured pages with '#' weren’t encoded properly

  • File-copy collection, filtering failed when file contains leading spaces.

  • Featured pages should not be considered as rank #1 for click tracking

  • The HTTP password field in the administration dashboard was stored incorrectly

  • administration dashboard didn’t allow a max_link_distance value of 0

  • Illegal divide by zero in reports-load-queries-log.pl caused report load to fail

  • Apostrophe in collection (internal) name did not work

  • Some TRIM document extraction errors were not being reported

  • Non admin user actions failed under IIS

  • TRIM password field was a plain text field

  • File permissions on Windows installs were not set correctly

  • TRIM logs weren’t flushed regularly

  • Windows detection code in perl caught other OSes (Cygwin, Darwin)

  • Xml.cgi didn’t output spelling suggestions on Windows because the call to spell.cgi didn’t work

  • Deleting a collection did not delete its report data from the reports data directory

  • Query_phrase parameter was broken

  • Upper case metadata classes incorrectly displayed in forms

  • .vmbx extraction in TRIM skipped document contents

  • administration dashboard showed the copy form section even if the user is not able to use it

  • administration dashboard stylesheets weren’t displayed correctly under IE 7

  • Padre output invalid XML when summary buffer overflows

  • Empty description metadata showed empty summary field

  • Cache.cgi displayed poorly on newer browsers

  • Search.cgi fixed with POST requests

  • Filemanger confirmation links used GET requests rather than POST

  • Title didn’t appear for MS word docs with titles

  • BASE HREF as returned by cache.cgi was invalid in XHTML documents

  • Indexing failed if external-metadata.cfg does not end in a newline

  • File copy collections now have their own type

  • Collection names returned by padre may have been incorrect

  • Removed the basic collection view from the administration dashboard

  • Query Expansion fixed on Windows

  • Use of crawler.remove_parameters no longer harms ranking quality

  • IUSR_$computername is given read access to C:\WINDOWS\Temp for rss.cgi

  • Padre failed to chsize() on windows with index size > 2GB

  • Xml.cgi didn’t operate correctly meta_

  • Improved include text for editing config files

  • Spaces in the URL prefix broke the generated live URL.

  • Left truncation operator (e.g. *ate) didn’t produce any search results

  • Padre produced invalid XML when a custom stop words list is used

  • Using a custom stop words list with no new line character at end caused segfault

  • Blocked users from creating a collection with no include pattern via administration dashboard

  • Scope parameters are now passed with all report URLs

  • HTML encode display of query logs to avoid malicious queries

  • Create-collection.cgi allows protocol-less start URLs for a web collection

  • The Funnelback public UI produced invalid HTML in some cases

  • Update logs not always written on windows

  • Query_expansion.cfg was required to end in a newline

  • Word_expansion.cfg did not show up in the files section

  • Padre failed with out of memory errors when told to index an empty or missing directory

  • Collection display name was not HTML safe

  • Cache.cgi was displaying strangely with query terms that use the '#' operator

  • File manager rules for "Create Folders" did not check for invalid internal folder name

  • Query/word expansion did not work with certain query_processor_option ordering

  • A phrase search for "gibbet maker" did not find the text "gibbet-maker"

  • Show-file.cgi now HTML escapes the files it shows

  • Featured pages containing HTML now show up correctly in the reporting UI

  • Deleting a collection on Windows didn’t delete its scheduled updates

  • Delete-rules.cgi and delete-folder.cgi now require confirmation through a form post

  • Search.cgi didn’t HTML escape ampersands in featured page URLs

  • Formatting error in query processor options documentation

  • Padre-iw did not recognize the combined form of ISO 8601 date/times

  • Detect if IIS is not installed in configure_iis_for_funnelback.pl

  • Invalid URLs were displayed for local collections in "Documents per Site" report

  • Padre QIE didn’t work on Solaris (does not upweight matching documents)

  • Padre geospatial queries didn’t work on Solaris