HTML search results integration guide

Background

Funnelback search results are often served from an HTML endpoint. This endpoint can produce a full page with headers/footers/CSS/JS, or it can produce a section of HTML with only the search results that can be integrated with another platform.

This guide shows how to embed Funnelback HTML search results into an existing website/CMS/platform. The Funnelback higher education stencil is used as an example, but the concepts may be applied to any implementation of Funnelback.

Choosing integration method

There are three ways of integrating your website with Funnelback search results which discusses the advantages and disadvantages of each integration method. Two of the methods involve integration with HTML search results that are templated using Freemarker. This guide outlines how to integrate with these two methods.

Additional advantages of embedding HTML in a CMS over returning the full page from Funnelback are:

  • Headers and footers: It can be challenging to perfectly implement headers and footers from a site when Funnelback serves the full page, especially if there are conflicting CSS frameworks. Challenges can also include Javascript conflicts on menu interactions and sticky menus for mobile devices.

  • Ongoing maintenance/redesign: If the website design is changed in the future, a change also has to be made and tested on the results page template in the Funnelback server.

  • SSL certificate: A dedicated search domain (i.e. search.client.com) is required when Funnelback serves the full page which requires its own SSL Certificate and renewals. This may have a cost and requires future maintenance to keep this certificate up-to-date on the Funnelback server.

How to follow this guide

This guide should be followed conceptually by applying the explanations and example code to the framework/CMS/platform that the implementation will be completed in. Throughout this guide, sections will call out where assumptions or specific techniques have been used for the purpose of the guide that should be generalized for other implementations.

The sample code in this guide uses Javascript, however, these concepts are not language-specific and should be tailored to each individual situation.

URLs used in this guide

Throughout this guide, example URLs are used to demonstrate concepts and specify whether a certain action is taking place on the Funnelback server or on the framework/CMS/platform of the client implementation. These URLs are fictional and do not actually exist.

  • www.client-university.edu: The website of Client University (the client framework/CMS/platform)

  • client.funnelback.com: The Funnelback server where the search indexes are hosted for Client University

In this guide, 'Client University' is the client and 'Funnelback' or 'the Funnelback server' is the vendor. Below is a representation of the requests:

HTTP Request Flow
  1. The original query is sent from the user to the client framework/CMS/platform

  2. The query is passed on to the Funnelback server

  3. Funnelback returns HTML containing the search results to the client framework/CMS/platform

  4. The search results are embedded into the complete page and returned to the user

Setting up the Funnelback configuration

The core concept to configuring Funnelback to return HTML to embed within a page instead of a full HTML document is to modify the HTML that Funnelback returns to be contained within a <div> element (or a <main> may be preferable semantically). The HTML endpoint in Funnelback should not return <html>, <head>, or <body> elements.

This can be accomplished by editing the Freemarker templates on the Funnelback server to adjust what HTML it is producing to change it from producing a full HTML document to instead produce just the desired section of HTML.

The ui.modern.search_link option in the profile configuration must be configured to match where the search results page will exist on the client framework/CMS/platform so that the links produced by the Funnelback server are correct.

In this guide, the search results page exists at https://www.client-university.edu/search, so the profile configuration setting would be:

ui.modern.search_link=/search

Configure the integration URL

The ui.integration_url configuration setting in the collection configuration must also be configured. This allows the Funnelback server to be aware of where the search engine results page exists on the client framework/CMS/platform for features such as the Insights Dashboard preview search.

ui.integration_url=https://www.client-university.edu/search?collection={collection}&query={query}&profile={profile}

Environment variables

Wherever possible, environment variables should be used instead of hard-coded values. Environment variables can be used for server addresses, ports, etc. and encourages separating configuration from code (see more information).

Usage of environment variables

The URL of the Funnelback server is environment specific — a development environment should request results from a Funnelback development environment and a production environment should should request results from a Funnelback production environment.

The process to set environment variables depends on the platform/CMS/language/server architecture used and is different for each case.

Creating the search route

This section describes the route handler for the "search" route that would exist at https://www.client-university.edu/search.

Add the 'search' route

The search route is responsible for relaying the user’s query to the Funnelback server and returning the full web page with the search results back to the user.

In this section, the basic functionality of the search route is implemented:

  1. Parse the query parameters from the user’s request

  2. Send the query to the Funnelback domain

  3. Receive the search results from the Funnelback domain

  4. Embed the search results with the rest of the page template and return it to the user

Send the query to the Funnelback domain and receive search results

The code added below constructs the Funnelback server URL to request results from with an environment variable (configured later) and the query parameters provided by the user. It then requests the search results and receives the response from the Funnelback server.

searchRoute.js
// The 'request' variable is the HTTP request sent by the user
// The 'response' variable is the HTTP response to reply to the user
/* ... imports ... */
const querystring = require('querystring')
const fetch = require('isomorphic-unfetch')
/* ... end imports ... */

/* ... inside the route handler ... */
const { query } = request  (1)

const params = querystring.stringify(query)  (2)

const endpoint = `https://${process.env.FUNNELBACK_SEARCH_DOMAIN}/s/search.html?${params}`  (3)

const funnelbackResponse = await fetch(endpoint) (4)
const searchResults = await funnelbackResponse.text()

response.render('layout', { searchResults }) (5)
/* ... end of the route handler ... */
1 Extract the query parameters from the request
2 Convert the query parameters to a string to pass along to the Funnelback server
3 Construct the URL of the Funnelback server to query
4 Send the request to Funnelback and receive the response
5 Pass the search results into the main layout (which includes the header and footer)
The HTML returned by the Funnelback server is already properly escaped, so it should not be escaped again before returning to the user to avoid text such as &amp; showing in the output.

Ensure that the querystring is sent to Funnelback exactly as it was sent by the user’s browser.

PHP-based servers and CMS, particular ones that use the $_GET function may silently replace dots . with underscores _ in the query string! For example, a querystring parameter for a Document Type facet f.Type=pdf may be converted to f_Type=pdf, which will be ignored by Funnelback because it is not the expected format.

Use an environment variable for the Funnelback search domain

For the fictional 'Client University' production site (www.client-university.edu), the search domain could be a production Funnelback server.

A development/staging/QA site should have a corresponding environment variable to a development/staging/QA Funnelback server.

Test the basic search result page

At this point, the basic search results page can be tested to ensure the queries. The next sections discuss error handling and additional features.

If the CMS is based on ASP.NET Web Forms there are additional considerations to work through.

The search form produced by Funnelback is a standard HTML <form> with an method of GET to the current page. It utilizes HTML <input> for the parameters, which are converted by the browser to URL querystring parameters.

ASP.NET Web Forms CMS surround the whole page with a POST <form> which causes the browser to not render the GET <form> used by Funnelback (forms cannot be nested in HTML). Extra development work is required to pass the correct URL querystring parameters to searches.

Error handling and logging

General error handling

The response code of the HTTP response from the Funnelback server must be checked if it is OK or if there is an error. In the event of an error, a friendly informational message should be displayed to the user (along the lines of "something went wrong") rather than sending on the response code itself or garbled error text. For example, if the Funnelback server replies with a 500 Internal Server Error, the framework/CMS/platform should not send a 500 page to the end user.

Any HTTP responses from the Funnelback server in the 4xx (Client Error) range or 5xx (Server Error) range should be considered an error response to handle.

It is not enough to check whether there is content in the HTML returned from the Funnelback server, as there are cases where HTML is returned but does not contain search results (for example, a 404 handler in the Funnelback server that returns a 404 page).

Notable errors Funnelback could return

Below are examples of possible HTTP errors that could be returned by the Funnelback server along with potentially causes. Besides the errors listed below, all 4xx and 5xx errors should be handled as described above.

400 Bad Request

  • The query sent to the Funnelback server may be malformed.

  • The query sent to the Funnelback server does not contain required parameters such as collection.

401/403 (Unauthorized/Forbidden)

  • If the search endpoint should be publicly available, check that the framework/CMS/platform is requesting the public (search) URL on the Funnelback server and not the protected (admin) URL.

  • If the search endpoint has special configuration in the Funnelback server for access control, such as IP restrictions, ensure that the framework/CMS/platform IP address is allowed in the configuration.

404 Not Found

  • The results endpoint may have been renamed on the Funnelback server causing the collection URL parameter to not match any known endpoint.

  • The value of the environment variables has changed to an incorrect value for that environment.

  • The URL to query Funnelback with is being incorrectly modified in some way.

500 Internal Server Error

  • Freemarker templates in Funnelback could have syntax errors or attempt access variables that are null.

  • Incorrect collection configuration could produce unexpected side-effects leading to an error.

  • All 500 errors are logged in the Funnelback server logs for further analysis.

502/503/504 (Bad Gateway/Service Unavailable/Gateway Timeout)

  • The Funnelback server may be down or unreachable, contact the appropriate team for Funnelback support.

Timeout / no response / other error

  • Check if there is a firewall or other blocker to traffic to/from the framework/CMS/platform and the Funnelback server.

  • Check if the library used to make the HTTP request is using a supported version of TLS.

  • The Funnelback server may be down or unreachable, contact the appropriate team for Funnelback support.

Logging

Any error response from Funnelback should be logged within the framework/CMS/platform using the appropriate logging process to help with troubleshooting the issue during the support process.

Passing the end user IP address for logging

The default configuration of Funnelback analytics and search endpoints is to log the IP address of the query. An IP database with geolocations is included in the Funnelback server to show in the Funnelback Insights Dashboard where queries originate.

This functionality can also be extended with the Curator feature to provide promoted results or customized content based on location of the person who is searching.

When the queries are passed from the user through a CMS/server/platform before being passed to Funnelback, the IP address of that CMS/server/platform would be recorded, which is not useful for analytics. This process describes how to ensure that Funnelback records the IP address of the original person who is searching.

Passing the original IP address

The X-Forwarded-For HTTP header is used to pass the original IP address of the end user.

searchRoute.js
// The 'request' variable is the HTTP request sent by the user
// The 'response' variable is the HTTP response to reply to the user
/* ... inside the route handler ... */
const { query } = request

const params = querystring.stringify(query)
const { connection: { remoteAddress } } = request (1)

const endpoint = `https://${process.env.FUNNELBACK_SEARCH_DOMAIN}/s/search.html?${params}`

const funnelbackResponse = await fetch(endpoint, {
    headers: {
        'X-Forwarded-For': remoteAddress (2)
    }
})
const searchResults = await funnelbackResponse.text()

response.render('search', { searchResults })

module.exports = router
1 Extract the IP address from the request
2 Simple example of sending the original user IP address as a X-Forwarded-For HTTP header

Configuring Funnelback for IP logging

By default, Funnelback will look at the last IP address in the X-Forwarded-For header to log in the analytics. The original user IP address should be the first IP address in the X-Forwarded-For header.

The logging.ignored_x_forwarded_for_ranges configuration option should be used to ignore known IP addresses so that the original user IP address is logged in Funnelback analytics.

The update the X-Forwarded-For header value plugin can be used to remove the first or last value of the X-Forwarded-For header, or all values but the first.

Caveats about this method of passing the IP address

This guide shows a much simplified version of the retrieving the end user IP address and passing it to Funnelback in the X-Forwarded-For header.

In reality, the architecture of every server will differ from platform to platform. If the server exists behind a reverse proxy, the original user IP address was probably appended to the X-Forwarded-For by the reverse proxy and should be extracted from there instead to pass onto Funnelback. The Node solution would likely involve using a package (such as forwarded-for) to manage these details.

Use the recommended method of accessing the original user IP address based on the framework/CMS/platform, or contact system administrators for more information on the system architecture.

Managing sessions (history and cart)

The Sessions feature (Search/Click history and Results Cart) is enabled through the use of an HTTP cookie. These cookies are only valid on the domain which they are assigned, and can only be assigned

Funnelback configuration

Sessions are enabled by editing the profile configuration for ui.modern.session.

The overview image for embedding search results is repeated below.

HTTP Request Flow

Reviewing this image now in context of the sessions cookie:

  1. The query from the end user to the client framework/CMS/platform will not contain the sessions cookie if they are a first time visitor on that browser, or it may contain the cookie if they are a repeat visitor.

  2. If the user’s browser sent the sessions cookie, this cookie should be sent onwards to the Funnelback server

  3. If no sessions cookie was sent to the Funnelback server, one will be generated and returned in the set-cookie header. Otherwise, the sessions cookie that was sent will also be returned in the set-cookie header.

  4. The client framework/CMS/platform sets the set-cookie header for the response to the end user with the correct value and domain

The name of the sessions cookie is user-id. If this were to change in the future, it could be worthwhile substituting an environment variable instead of a hard-coded value.

For steps (1) and (2) above, the implementation requires getting the sessions cookie from the user’s request and pass it on to the Funnelback server. The base case is that the user has not visited the search before and does not have the cookie set.

// The 'request' variable is the HTTP request sent by the user
/* ... inside route handler before receiving the results from Funnelback ... */
if (request.cookies && request.cookies['user-id']) {
    headers.cookie = `user-id=${request.cookies['user-id']}`
}
/* ... rest of route handler ... */

The headers variable was created earlier in the [Passing the End User IP Address for Logging] section. This section adds the sessions cookie to that headers variable, if applicable.

In step (3) above, the Funnelback server sends the sessions cookie in a set-cookie header. Typically it is the browser who parses a set-cookie header, not a server, so there may not be a built-in way of parsing this header depending on the framework/CMS/platform. A Node package called 'set-cookie-parser' can be used for this guide to parse the set-cookie header.

After the search results response has been received from the Funnelback server, parse out the value of the sessions cookie.

/* ... imports ... */
const setCookieParser = require('set-cookie-parser')
/* ... end imports ... */

/* ... inside the route handler after receiving the results from Funnelback ... */
const setCookies = setCookieParser.parse(funnelbackResponse.headers.get('set-cookie'), {
    map: true
}) (1)
const userIdCookie = setCookies['user-id'] (2)
/* ... rest of route handler ... */
1 Uses a simple parsing library to parse the 'set-cookie' header into a map
2 Gets the value of the user-id cookie from the map

If the sessions cookie was parsed from the set-cookie header of the Funnelback response, put that value in a set-cookie header to send to the end user.

If the sessions cookie is not sent by the Funnelback server, this should be handled and not cause an error. For example, the sessions may have been disabled on the Funnelback server.

/* ... inside the route handler after the previous step ... */
if (userIdCookie) {
    res.cookie(userIdCookie.name, userIdCookie.value, { (1)
        maxAge: userIdCookie.maxAge * 1000, (2)
        domain: process.env.COOKIE_DOMAIN, (3)
    })
} else {
    // Funnelback did not send a 'set-cookie' value for 'user-id'
    // This may imply an error, or it could just mean the configuration was turned off
}
/* ... rest of route handler ... */
1 Create a 'set-cookie' header for the end user with the value received from Funnelback
2 Double-check whether the cookie max age defaults to seconds or milliseconds and convert appropriately
3 An environment variable can be used for the cookie domain to use the same code between a development and production environment
If the sessions cookie sent by the Funnelback server is different than the cookie sent by the user’s browser, the Funnelback server value should be used. The cookie on the user’s browser may be expired or invalid, so always trust the value returned by the Funnelback server.

The process described above manages the sessions cookie in response to searches, and will ensure that the user’s search history is available to them. The sessions cookie is also used in two other areas:

  1. Click tracking

  2. Cart (Saved Items)

The client framework/CMS/platform can only set cookies for its own domain, a server cannot set cookies on a user’s browser for a different domain. In this example, the cookie can be set by Client University for domains ending in client-university.edu.

The click tracking URL exists on the Funnelback server at client.funnelback.com/s/redirect. As this is a different domain, the sessions cookie in the user’s browser will not be sent to that redirect/click tracking URL when the user clicks on a result.

Similarly, the Cart (Saved Items) API endpoint exists on the Funnelback server at client.funnelback.com/s/cart.json. Cookies that are valid on client-university.edu will not be sent to the Cart endpoint as it is on a different domain.

This problem can be solved in two ways:

  1. Create a dedicated search subdomain within the client domain

  2. Proxy the other requests through to Funnelback

Option 1: create a dedicated search domain

This problem can be solved by creating a new domain on the client side and using a Domain Name Service (DNS) CNAME record. For example, a new domain search.client-university.edu with the following CNAME record.

search.client-university.edu    CNAME    client.funnelback.com

Now, all requests sent to search.client-university.edu, a domain owned by the client, are sent to the Funnelback server. If the domain of the sessions cookie is .client-university.edu, that cookie will also be sent by the user’s browser to search.client-university.edu

Option 2: proxy the cart requests

The requests that control behavior in the Cart (Saved Items) functionality are generated from client-side Javascript.

The Cart (Saved Items) API endpoint exists on the Funnelback domain, so if the request is made by the end user’s browser while on the client’s domain, the cookie will not be sent by the browser. Instead, the request can be sent to an endpoint on the client framework/CMS/server, which forwards that request onto the Funnelback server, similar to how the search route was set up (see earlier sections).

Each of the GET, POST, PUT, and DELETE requests should be forwarded onto the Funnelback server with the same URL parameters and body data. The response from Funnelback should also be sent back to the user’s browser.

If the official Funnelback sessions plugins are used, ensure to configure the apiBase options (see configuration options), otherwise follow relevant instructions to find where to configure the base URL of the Cart (Saved Items) API endpoint.

Option 2 (continued): proxy the click redirect requests

The same concept applies for click redirects, if the redirect URL exists on the Funnelback domain, the sessions cookie will not be carried by the user’s browser to that domain. A redirect URL can be set up on the client framework/CMS/server which captures the user’s session cookie and forwards that to the Funnelback server redirect URL.

The redirect URL on the Funnelback server returns a simple HTTP header with a 302 response and the Location that the browser should redirect to — this HTTP header can be forwarded to the end user with no modification except for the set-cookie header as described in earlier sections.

If this proxy is implemented, the click link should be configured in the profile ui.modern.click_link.

This process is only necessary for the personal Click History feature (i.e. "my clicks"). Overall click analytics will still work without this proxy.

Controlling the URL parameters

The Funnelback server search results endpoint can be thought of as an API with required and optional parameters.

The required parameters to pass to Funnelback are:

  • collection

  • query

Optional parameters may include, among many others:

  • profile

  • form

  • facets (the value varies per facet)

The parameters that are sent to the Funnelback server search results endpoint can be controlled separately from what is shown in the user’s browser. This is not necessary to the implementation.

Sanitizing the URL parameters is not necessary, these URL parameters can be passed to the Funnelback server without modification. The Funnelback server sanitizes those parameters when it produces the search results.

Hide the collection parameter

It may be desirable to hide the collection=<COLLECTION-ID> parameter from the URL in the user’s browser to shorten the link or otherwise hide the name of the collection.

# For example:
https://client-university.edu/search?query=programs
# Instead of:
https://client-university.edu/search?query=programs&collection=client~client-university-search

The code added below adds the collection parameter which is defined as an environment variable.

// The 'request' variable is the HTTP request sent by the user
/* ... inside the route handler ... */
const { query } = request

const params = querystring.stringify(query)
params.collection = process.env.FUNNELBACK_COLLECTION (1)
/* ... rest of the route handler ... */
1 Sends the collection parameter from an environment variable to Funnelback
Additional changes to the Freemarker templates may be required on the Funnelback server to remove the collection in other page elements such as facet links and the search form.