Funnelback logo

Documentation

CATEGORY

Optimising your Web Site for Search

Introduction

These guidelines will help you build a site that is highly searchable. A searchable site means enhanced search experience in Funnelback (and any other search product), plus greater visibility in global search engines such as Yahoo, Google or MSN. This translates to efficiency gains for employees and easier information availability for customers and stakeholders.

Searchability

The following list gives suggestions on improving your site's searchability:

  1. Avoid excessive reliance on dynamically generated web pages: Spiders work by following links. With dynamically generated content, they can potentially miss important pages or clutter up indexes with rubbish. When you do generate pages dynamically, give each page a single, short, human-readable URL.
  2. Avoid excessive use of frames: Funnelback indexes the frame and its component pages separately. When a particular search result is returned it may appear without the context which would have been provided by the frame.
  3. Split large documents into smaller documents: If some of your documents are very long, consider publishing them as separate chapters or sections. Imagine that your organisation has an administrative procedures manual (APM) which is 3,000 pages long and a HR employee enters the search query "long service leave". A PDF file of the whole APM wouldn't be a good answer to the query, even though it contains the best answer, because the HR employee would then need to search through the very large document for what they actually wanted. A far better answer would be a single HTML file containing "Section 13.4.5: Long Service Provisions".
  4. Exclude unsuitable material: Configure your collections (or use ROBOTS.TXT files) to prevent the crawler from accessing material which isn't suitable for searching. You may wish to exclude mirror sites and directories of non-textual data. Excess material increases disk space usage, and slows down crawling, indexing and query processing. Focusing the material indexed may also improve the quality of results.
  5. Prevent individual pages or sections of pages that do not contain useful information from being indexed. This might include navigational elements, headers, footers, etc. (See Controlling indexable content in PADRE for details).
  6. Excluding portions of a page (such as navigation content): The query-biased result summaries on some sites can suffer in quality because the summaries include sentences extracted from the site navigation text instead of the main document content. A solution for this problem is to insert directives into the Web pages to indicate that certain sections should not be indexed. See noindex expression (collection.cfg). Note that anchor text is indexed as part of the target document at all times to ensure that ranking quality is not affected.

top ⇑