Using Funnelback to generate structured auto-completion

Background

Structured auto-completion is a great way of enhancing the utility of the auto-completion provided by a search. It allows for rich suggestions containing images and metadata to be displayed, and allows the trigger and action items to be defined for the suggestion.

However the catch is that these suggestions are produced from a CSV data source, and this information has to be populated from somewhere. This can be manually edited via the administration interface, or generated by a system such as a database.

This article shows how metadata within the Funnelback index can be used to generate the CSV data and from this the auto-completion index.

This article contains a number of advanced concepts and should only be attempted by an experienced Funnelback implementer. The implementer also requires back end access to the Funnelback server in order to complete all the steps.

Before you begin

Structured auto-completion uses CSV data to define the suggestions produced by Funnelback.

Each suggestion has a number of properties and understanding what each of these are and how they affect the generated suggestions is key to producing effective autocompletions.

Each suggestion requires the following to be defined:

  • suggestion: this is the data used to produce the auto-completion suggestion that is presented to the user. The response can be plain text, a JSON data structure (recommended) or a small chunk of HTML. When using JSON data, a template can be used to define how the data is presented.

  • trigger: this is the word or phrase that will be used to select if a suggestion is relevant to the current partially typed query. The partially typed query is compared to the trigger using a left-anchored substring match. For the trigger to fire the partially typed query needs to fully match the first characters of the trigger.

  • action: this defines what happens when a user clicks on a suggestion. Possible actions are run a specified query, redirect to a URL, run some Javascript.

There are some additional fields within the CSV that also need to be defined - these control factors such as the weighting applied to the suggestion, the type of suggestion and action and categorisation of the suggestions so that suggestions can be grouped into topics.

Further information: Auto-completion CSV(

Files used in this tutorial

Before you start, download the helper files from GitHub. This bundle includes the FTL template, workflow script and hook script referenced below.

Limitations

  • The template currently only supports generation of CSV with URLs sourced from the live url of the document.

  • The normalisation code is currently disabled due to an incompatibility in Funnelback 15.10.

Creating the auto-completion CSV

Step 1. Create a profile for auto-completion generation

Switch to the collection that contains the data that will be used to populate the auto-completion suggestions.

Create a profile within this collection for the auto-completion. If you are generating a single CSV then name it something like auto-completion, otherwise assign the profile a unique name. The name you choose will be required when you configure the triggers for the auto-completion. This profile will have a template that produces search results in the auto-completion CSV format. The profile should also be optimised so that only the desired metadata fields are returned and most functionality is disabled.

Step 2. Optimise the profile and define metadata fields

Configure the display options for the autocompletion profile so that relevant metadata is returned for the generation of the auto-completion CSV. Add the following to the padre_opts.cfg for the autocompletion profile.

The metadata fields to expose to auto-completion also need to be defined using the SF parameter.

Other features that are not being used should be disabled. Something like the following is a good start:

-SM=meta -SF=[list of fields to return for auto-completion] -log=false -vsimple=true -bb=false -countgbits=63 -spelling=off -show_qsyntax_tree=off -qsup=off -rmcf=[disabled] -num_ranks=10

Further information: Query processing optimisation

Step 3. Add a template to generate auto-completion

Install the auto-completion.ftl (and auto-completion-master.ftl) template into the profile. (The template is available in the GitHub bundle).

Step 4. Configure triggers and actions for auto-completion

A set of triggers must be defined for each auto-completion profile. The triggers are defined in the collection.cfg for the collection that includes the profile that was created above.

Add a collection.cfg line in the following format:

auto-completion.<PROFILENAME>.triggers=COMMA SEPARATED LIST OF TRIGGERS

The triggers are made up of metadata fields. Each trigger can be constructed from several fields concatenated together, and multiple triggers can be defined, delimited with commas. Ensure that you set default values for any metadata fields.

e.g. For a profile called 'staff' define 3 triggers based on firstname lastname, lastname firstname and department metadata:

auto-completion.staff.triggers=s.result.metaData["firstname"]! s.result.metaData["lastname"]!,s.result.metaData["lastname"]! s.result.metaData["firstname"]!,s.result.metaData["department"]!

This means that a user with the following metadata:

  • firstname: John

  • lastname: Smith

  • department: art

will generate autocompletion with three triggers:

john smith
smith john
art
[source,text]

If the profile doesn’t have URLs, or you wish that the triggers be used to run a query instead than configure a query action for the profile by adding a collection.cfg line of the following format:

auto-completion.<PROFILENAME>.action-mode=Q

e.g.

auto-completion.staff.action-mode=Q

This will result in a query being run (for whatever the defined trigger term is) when a suggested is clicked.

Step 5. Add a post process hook script

A post process hook script is used by the auto-completion template to clean the triggers.

The hook script performs two tasks:

  • stop word removal: any stop words that appear within a multi-word trigger will be removed.

    e.g. a multi-word trigger The meaning of life will cause 3 CSV lines to be generated: the meaning of life, meaning of life and life.

  • normalisation: Create normalised versions of any trigger words that contain diacritic (accented) characters. This allows auto-completion triggers to match the non-accented version of the trigger words. (Note: this is currently disabled in the code)

e.g. a trigger of André Rieu would result in both accented and non-accented CSV lines being produced:

andré rieu
andre rieu
rieu andré
rieu andre

The hook script included in the GitHub bundle should be added to the collection. If there is an existing hook_post_process.groovy then cut and paste the code contained within the GitHub bundle and append it to the collection’s existing hook script.

Step 6. Test CSV generation

Test the autocomplete.ftl to ensure that correct CSV is generated. Run a query for !showall using the autocomplete profile and autocomplete template and examine the output by viewing the page source in the browser. This shows the output for the first 10 results. The page source should be valid CSV in the query completion CSV file format.

Step 7. Increase num_ranks

Update padre_opts.cfg to increase num_ranks. Increase num_ranks to an appropriate value (1000 in the example below) to ensure enough results are returned for the auto-completion. Remember that that generated CSV file will not be paginated so the auto-completions returned are effectively the first page of search results.

Set the following options in the padre_opts.cfg then publish the padre_opts.cfg

-SM=meta -SF=[list of fields to return for auto-completion] -log=false -num_ranks=1000 -vsimple=true -bb=false -countgbits=63 -spelling=off -show_qsyntax_tree=off -qsup=off -rmcf=[disabled]

Large values of num_ranks will result in long response times and could cause the web server to run out of memory if a value too large is chosen. Optimising the options set in padre_opts.cfg can also assist in reducing the load and memory requirements caused by running a large query. The settings can be optimised by turning off any unused features (for this profile) and also minimising the amount of metadata returned (so limit the SF values to only those used in the template).

Step 8. A post-index workflow

A post index step that performs the following must be added:

  • Runs a Funnelback query to generate and save the auto-completion CSV

  • Runs build_autoc to construct the auto-completion index from the CSV data

The following Unix shell script can be used to generate the auto-completion.

if you’re index is particularly large this approach might need to be modified as it may not be possible to return the entire index - in which case the CSV would need to be generated in several files using the start_rank and num_ranks parameters to return pages of results which would then be concatenated after downloading.

Add the post_index.sh that’s included in the GitHub bundle to the collection’s post_index_command.

e.g.

post_index_command=$SEARCH_HOME/conf/$COLLECTION_NAME/@workflow/post_index.sh

If a post_index.sh already exists in workflow then rename the file when you add it and ensure that command is also run by the post index command.

Step 9. Configure the auto-completion JavaScript

Note: These instructions are designed for use with the Funnelback concierge plugin, and these instructions assume that Concierge is already configured on the collection. See the concierge documentation for details on how to configure this.

Step 10: configure a dataset

Add a dataset for each auto-completion profile that was generated.

e.g. the example above for the staff example above you might use something like:

datasets [
	staff: {
		// This should be set to the collection on which the autocompletion was generated
		collection: 'staff',
		// This should be set to the profile name used above
		profile: 'staff',
		// Other options
		show: 3
	}
]