P@noptic 5.5.0 release notes
Released: 17 Sep 2004
Prior to v6.0 Funnelback was known as P@noptic Search |
Note that versions 5.2, 5.3 and 5.4 were not externally released.
Crawler
-
Incremental crawling: If your web servers provide
Content-Length
information in response to HTTP HEAD requests, the time and network traffic required to update a crawled collection can be dramatically reduced. The administration interface now supports both full and incremental updates and allows n-incremental, 1-full scheduling patterns. -
Increased crawler efficiency: More efficient memory use, less frequent checkpointing and more streamlined internal operations allow more parallelism and faster crawling.
-
Duplicate detection within the crawler: This can dramatically reduce the number of documents which are downloaded only to be subsequently deleted.
-
Improved crawler handling of problem web sites: Leads to even less missed content.
Database/XML indexing
-
Support for Ingres and Sybase databases.
-
Enhanced efficiency in indexing large databases.
-
More faithful "metadata" summaries. Database fields or XML documents containing markup (e.g. HTML bold tags) are protected during indexing and can be faithfully rendered in titles and metadata summary elements.
-
More faithful "metadata" summaries. Database fields or XML documents containing markup (e.g. HTML bold tags) are protected during indexing and can be faithfully rendered in titles and metadata summary elements.
Indexer/query processor
-
Ability to supply a customized stop word list via the
-STOP
query processor option. Note, however that stopword elimination is not particularly important in Panoptic. -
Live URLs in web collection search results now always correspond to the URL as supplied by the original webserver.
-
Preliminary support for UTF-8 character encoding, but only for European languages at this stage. Please see query processor option
-utf8
, and indexer option-utf8input
. -
The length of the original document is now reported, rather than length after text extraction and storage on disk.
-
Substantially improved handling of dates in XML documents.
-
Better support for large collections of documents.
-
Query term highlighting in metadata summaries, except for URLs.
-
By default, uncanonicalized queries are logged rather than queries in which the words have been reordered.
-
Allow sites to avoid indexing repeated navigational elements by inserting HTML comments. If one of the strings
*stop_indexing*
,beginnoindex
ornoindex
appears at the beginning of an HTML comment (with or without preceding white space), indexing will be suppressed until the next HTML comment beginning with either of*start_indexing*
orendnoindex
. Note that anchor text of outgoing links will still be propagated to the link targets for indexing.
Panoptic 5.5.5
Upgrading to Panoptic 5.5.5
-
Administrators running Panoptic on Windows may wish to modify their collection update tasks in the Windows Task Scheduler to use
update-win.pl
instead ofupdate.pl
. This will enable the main collection update log to be created atSEARCH_HOME\log\update-<collection>.log
during scheduled updates.Also note the upgrade issues included in earlier releases if upgrading from a version of Panoptic earlier than 5.5.4.
Panoptic 5.5.5 changes
Bug fixes
-
Fixed an intermittent crash in the Excel filter (xlhtml).
-
Fixed a bug in the processing of anchortext in cases where the link target contains whitespace.
-
Fixed an overflow in the expiry date of SSL certificates.
-
Fixed the permissions on the generated SSL certificates.
-
Enable the Panoptic Web administration virtual host to be included from Apache’s conf.d directory.
-
Added quotes around the certificate paths in the Apache configuration.
-
Fixed quoting and escaping issues in limit.pl (Windows)
-
Mandatory exclusion operators,
-
, in metadata search form elements were being misinterpreted due to recent HTML encoding changes in PADRE. -
Fixed a bug PADRE’s parsing of the
index.bldinfo
file. -
Fixed a bug in scanning documents containing
<title>hello</title> …</title>
. -
Fixed a bug in
search.cgi
that prevented it from being run under suexec. -
Fixed various warnings and errors in
schedule.cgi
. -
Fixed a potential buffer overflow when reading the licence key.
-
Added a missing argument to the call to
setup-search-location.pl
. -
Default result titles in PADRE’s XML are now enclosed in CDATA sections.
Enhancements
-
Improved extraction of metadata from Word files.
-
New script,
update-win.pl
, to unable update logging under Windows. -
New feature One Shot which sends the user directly to the URL of the first search result when the
oneshot
CGI parameter is specified. -
Upgraded the JRE to 1.5.0.04.
-
Removed the redundant
xxxtextify
text from filtered documents. -
Upgraded the
pdftotext
filter on Solaris to version 3.00. -
New form substitution tags,
resifcollection
andresifnotcollection
to enable collection specific results presentation for meta-collections.-
Update logs now include a warning if the status email fails to send.
-
The PDF text filter now outputs the subject metadata as description metadata and subject metadata.
-
The crawler will now follow hyperlinks that are not enclosed in quotes.
-
Upgraded Apache for Win32 to 2.0.54.
-
Panoptic 5.5.4
Upgrading to Panoptic 5.5.4
-
No issues when upgrading from 5.5.1 or later. Refer to the version 5.5.1 upgrade issues for information about upgrading from earlier versions.
Panoptic 5.5.4 changes
Bug fixes
-
The external metadata system was skipping default pages.
-
The
resifnot{}
tag was not working across multiple lines. -
Fixed an anchortext matching problem on Windows.
-
Don’t override
DC.Title
metadata with empty HTML<title>
elements. -
Fixed an error in the
<s:Truncate>
help page.
New features
-
Added support for radio buttons in search interfaces.
-
xml-splitter.pl
can now accept regular expression input instead of just a literal Xpath. -
New official Debian and SuSE versions of Panoptic.
-
Better support for ISO8601 date scanning.
-
Added the new "Powered by Panoptic" logo to the default forms.
-
Added support for handling anchortext pointing to HTML redirects.
Upgrading to Panoptic 5.5.3
-
No issues when upgrading from 5.5.1 or later. Refer to the version 5.5.1 upgrade issues for information about upgrading from earlier versions.
Panoptic 5.5.3 changes
Bug fixes
-
Fixed a problem related to indexing broken symlinks and long filenames.
-
Update status emails include log files from the live view when they are performed in the live view.
-
Click.cgi
redirects and exits if referring URL is undefined. -
Fixed uninitialised variable warning in Utils.pl.
-
Fixed a bug in CMWeb collections that caused the 'filter' option to be reset to 'false' when edited via the Web admin interface.
-
An empty scope parameter before query parameter no longer removes the query parameter.
-
Fixed the indexer crash when phantom documents are used in conjunction with the MDSF file.
-
Fixed the multiple match-point overflow error messages.
-
Numerous fixes to
refine.cgi
. -
Reinstated the "next page" link.
-
XML files are correctly displayed in the cached view.
-
Fixed crawler execution problem that occurred intermittently when the update was started by cron.
-
Fixed a bug in whitespace queries.
-
Fixed a bug in '0' value queries.
-
The Solaris installer now creates the SSL certificates before attempting to configure Apache.
-
The Solaris installer no longer adds the "LoadModule Suexec" directive (in case it’s already compiled in).
Enhancements
-
Better use of CSS in the default forms files.
-
Collection size information is written to size.log.
-
Click.cgi
(click-through logger) is now installed in the Web area. -
Add support for logging result rank in
click.cgi
. -
The inline thesaurus matches on whole words only.
-
Installation continues if Apache fails to restart.
-
Added new form tag operator:
<s:Compare>
-
Added new form tier bar customization tags:
<s:TierBarFeaturedPages>
,<s:TierBarFullyMatching>
and<s:TierBarPartiallyMatching>
-
Added support for the
showform
CGI parameter that forces display of the initial search form by not executing the query processor. -
Added support for the
fp_tiers
CGI parameter to enable/disable featured pages tiers. -
Added support for 'separator' and 'label' attributes for
<s:PrevNext>
. -
Fixed a bug in processing
xxx_orplus
queries. -
The Apache configuration is now written to a temporary file if the
SEARCH_SERVICE
environment variable has been set toBUREAU
in the existing Apache config. -
Added map tags around the
<s:PrevNext>
tags for better accessibility. -
Added support for collection option
index
to enable/disable indexing. -
Added
calendar
to standard crawler exclude patterns. -
Enable meta collections to be updated for the purpose of query log management.
-
Added new version of PDF introductory guide.
-
Added support for
featifnot
for enhanced display of featured pages. -
Better support for dealing with date operators in
refine.cgi
. -
Added robots
noindex
metadata to search forms. -
Fixed display of featured pages when a description is not provided.
-
Inline thesaurus: Display all suggestions and sort lexicographically.
-
Integrated the new
SecureCGI.pm
library to enhance protection against cross-site scripting attacks. -
Implemented a minimum term frequency of 2 for words to be added to the spelling dictionary.
-
The document format select list in the advanced form is now of type scoped and to remove result tiers that to not conform to the selected format.
-
Metadata query parameters can be combined with the scoped and (
*_sand
), not (*_not
), and and (*_and
) modifiers. -
Multiple
scope
CGI parameters are joined to enable scopes to be selectable via check boxes. -
The scope parameter is now displayed on a separate line on both forms.
-
The scope parameter is recorded as a hidden parameter in the advanced form to enable multiple scoped queries.
-
The current scope is displayed on the advanced form.
-
Added a work-around for a RedHat EL3 bug in the spelling dictionary builder.
-
Updated version of
cpio.exe
for use with the installer. -
All calls to Perl scripts are prefixed with the full path to the Perl interpreter to avoid the Windows file association bug.
-
Encode special characters in Windows filenames.
-
PerlIS is no longer used by default with IIS (
Perl.exe
is far more stable). -
Check that Windows user has administrator privileges.
-
Check that Windows user is using Perl 5.8.
Panoptic 5.5.1
Upgrading to Panoptic 5.5.1
-
To enhance the customizability of the results pages, an HTML
<br>
element was taken out of the search wrapper (search.cgi
) and put into the default interface form files. The results page formatting of existing collections that use the Featured Pages mechanism would benefit from having this<br>
element inserted into the form files immediately after the featured pages link (see the newsimple.form.dist
file as a reference). -
Use of the 'search-apache' group has been replaced with group 'search'. This change is made automatically during the upgrade to 5.5.x and doesn’t require any action from the administrator.
-
The query processor now outputs result document sizes as bytes instead of kilobytes. The document sizes also now represent the pre-filtered size. The default search wrapper,
search.cgi
, converts these sizes back to kilobytes but custom wrappers will need to handle this. -
Titles and metadata summary fields in the PADRE XML output are now protected by CDATA sections. The search wrapper (
search.cgi
) has taken over the role of stripping the CDATA markup and encoding the encapsulated text into HTML.This doesn’t require any changes for users of the standard
search.cgi
wrapper. Administrator’s using a custom search wrapper will need to make these changes to their own wrapper or enable backwards compatibility mode by specifying-nocdata
as an indexer and query processor option.For a full explanation of this change see: http://www.panopticsearch.com/AdminHelp/summaries.html
Panoptic 5.5.1 changes
Bug Fixes
-
Featured pages are now displayed when there are zero results.
-
Correct detection of cron running on Debian for status.cgi.
-
Set the
spelling_main_dictionary
parameter to 'english' on Red Hat 9 installations. -
Fixed the bug resulting from having an accented character as the last character in a summary.
-
PADRE should respond to a
-v
before failing due to licence check. -
PADRE looks for the licence key in
C:\Panoptic\search
on Windows whenSEARCH_HOME
hasn’t been set. -
The
tmp_qresults
file is unlinked after the query cache is loaded. -
Stemming no longer deletes single character query terms.
-
Fixed crawler halting problem on some systems.
-
Fixed index spell parsing problems (due to accented characters).
-
Index-spell.pl
now sends itsSTDERR
to a log file.
Enhancements
-
Incremental crawling.
-
Exact text representation of metadata summaries fields.
-
Increased crawler efficiency: More efficient memory use, less frequent check-pointing and more streamlined internal operations allow more parallelism and faster crawling.
-
Improved crawler handling of problem web sites (including Lotus Domino based sites).
-
Enable parsing of standard date formats in the XML scanner.
-
More efficient database extraction.
-
Support for Lotus Domino, Ingres and Sybase databases.
-
Ability to supply a customized stop word list via the
-STOP
query processor option. -
Live URLs in web collection search results now always correspond to the URL as supplied by the original web server.
-
Preliminary support for UTF-8 character encoding, but only for European languages at this stage. Please see query processor option
-utf8
, and indexer option-utf8input
. -
The length of the original document is now reported, rather than length after text extraction and storage on disk.
-
Better support for large collections of documents.
-
Query term highlighting in metadata summaries.
-
Support for crawling annodexed media files.
-
Support for integration with the iPlanet Web server.
-
PADRE scans up to 220 characters into each doc for
<html>
instead of only 120 to determine document type. -
Improve PADRE’s ability to determine the fully qualified hostname for licence checking purposes.
-
Set the appropriate threading library based on kernel version in
crawl.pl
. -
Move the
DB2XML
log file to the log directory under the collection root (so that it can be viewed by the admin interface) and rename it todatabase.log
. -
Make the
max_heap_size
argument apply to theDB2XML
call. -
Parameterised the Perl locale in
crawl.pl
. -
All
textify.conf
files now refer toSEARCH_HOME
as$SEARCH_HOME
on Solaris and Windows. -
Removed the query stats, query report and status scripts from the search area.
-
Now use
whoami
for root user detection. -
Limit the number of Blat retries on Windows to 3.
-
Replace the 'search-apache' group with 'search'.
-
Check for the existence of files before chgrpping (the
chgrp
command is no longer silent with the-f
option). -
Add the
aspell
,pstotext
andwvWare
,catdoc
,xlhtml
,ppthtml
source packages to the Solaris version. -
Allow the Web user and group to be different.
-
Added a new line to the htpasswd file when not using Apache (for iPlanet Integration).
-
Improved the featured pages formatting.
-
Strip CDATA sections from input XML.
-
Protect summary titles and metadata fields with CDATA sections in PADRE’s output XML.
-
PADRE returns file sizes in bytes. These are then converted to kilobytes by
search.cgi
. -
No longer display collection listing in
search.cgi
if environment variableSEARCH_SERVICE
is set toBUREAU
. -
Upgrade Apache to 2.0.50 on Win32.
-
Upgrade to JRE 1.4.2_05 on Win32, Red Hat and Debian.