Best practices - 1.1 general

General

This section details general best practices to follow when implementing Funnelback.

Avoid customisation

Customisation, beyond the result templates should be avoided where possible as it can affect the ability to upgrade easily and can also have other unintended side effects on the correct functioning of the search.

Where possible avoid solutions that:

  • require custom filters, especially if there is a dependency on the structure of the markup of the content.

  • require manipulation of the query or response data via hook scripts.

  • require customisation of the faceted navigation or query completion behaviour.

Avoiding customisation has the following benefits:

  • Implementation time is reduced so it’s cheaper for the customer to make use of standard functionality.

  • Upgrades are simplified as standard functionality will be upgraded automatically by Funnelback when an upgrade occurs. This reduces the cost of upgrading to a newer version of Funnelback.

  • Bugs within standard functionality can be reported to the developers. Bugs in custom functionality require custom development and are unsupported.

Exclude unnecessary content

Use whatever processes are available to prevent unnecessary content from ever entering the search index.

This includes:

  • For web collections using robots.txt, robots meta tags and Funnelback no index tags.

  • Using the gatherer’s include/exclude mechanisms.

  • Gathering content as a user that only has access to the relevant documents.

Data cleansing

Where possible clean any data at the source. This includes:

  • Adding noindex directives and any includes for header/footer includes

  • Adding/extracting metadata

  • Transformation of database fields

Should a filter or a hook script be used?

Data cleansing efforts should be applied as close to the source as possible. The order of priority for cleaning should be:

  • Source: Can you arrange for the data to be as close as possible to the expected format? Can you gather only what is needed (include / exclude patterns, noindex tags)?

  • Custom filter (Groovy): changes the content before it is indexed.

  • Hook scripts (Groovy): changes the content in the data model as it is returned from the index.

  • Server-side template (Freemarker): changes the content as it is read from the data model as it is being displayed to the end user.

  • Client-side scripting (JavaScript): changes the content within the user’s browser.

Cleaning the data closer to the source has a number of benefits:

  • It is easier to understand what is going because there are an increasing number of places where cleaning can occur. e.g. having JavaScript code correct something in the data for display would require an implementer to inspect the JavaScript, then FreeMarker, then the hook scripts, the filters and finally the data to be able to understand what the JavaScript is doing. It is also confusing for the content owners as the cleaning is often not visible to them - so what is in their source content is not reflected in what they see in the search results.

  • The data clean has a wider effect. E.g. cleaning code in the Freemarker template does not affect the JSON and XML output, and index related functionality such as sorting will not be affected by any changes. Cleaning done in a hook script will not affect the cached copy of the document. Cleaning at the source will affect everything (including other search engines that might index the content).

  • Preventing unwanted or uncleaned data from entering the index will improve ranking quality as there is less noise in the index.

Use custom configuration settings

Arbitrary settings can be stored inside the collection configuration file collection.cfg. Settings that Funnelback doesn’t recognize will be ignored and left as-is. These arbitrary settings can then be accessed from various parts of the system such as templates, hook scripts and filters.

This is especially useful to store configuration values rather than hard-coding them in the templates or filters. Common uses include the capture of URLs of third-party resources, API keys to access a service or settings used by a reusable script or template.

api.key=12345678
logo.url=http://example.org/logo.png

Accessing custom configuration settings

Search template (FreeMarker)
<img src="${question.collection.configuration.value("logo.url")}" />
Public UI hook script (Groovy)
def apiKey = transaction.question.collection.configuration.value("api.key")
Filter (Groovy)

Accessing from a document filter:

def apiKey = context.getConfigValue("api.key").orElse("")

Accessing from a jsoup filter:

// Get all configuration keys with a common prefix (myFilter.) into a list
def configKeys[] = context.getConfigKeysWithPrefix("myFilter.")

// Get the value of the api.key configuration setting
def apiKey = context.getConfigSetting("api.key")
Workflow script (Groovy)
// Add the relevant import at the top
import com.funnelback.common.config.*

// Then:
def config = new NoOptionsConfig(new File("/path/to/search-home"), "collection-name")
key = config.value("api.key")