Implementer training - push indexes

Push indexes

We have previously looked at standard Funnelback data sources (such as web data sources) that follow a linear update cycle of gather, filter, index, swap to produce a searchable index.

Push indexes are quite different, handling indexing only - updates to a push index are made using an API.

This means that a separate process or set of processes is required to handle the gathering and filtering of the content.

Push indexes also differ from standard data sources in the way indexes are stored and managed - push indexes don’t have live and offline versions. Push indexes also update whenever changes are committed - this means you don’t necessarily have to wait for an update to complete before you can start searching.

When working with a push index it is critical for the code that interacts with the push API to handle error conditions (for example catch any errors and queue the items for pushing at a later time).

The search dashboard also includes an interface that allows for interaction with the API via web forms, allowing data to be interactively added or removed from the indexes.

Tutorial: Create a push index

This exercise uses the API-UI available in the administration interface to interactively run API calls. Interaction with a push index is usually performed programmatically by contacting the push-api REST endpoint and passing in the appropriate parameters.

  1. Log in to the search dashboard where you are doing your training.

    See: Training - search dashboard access information if you’re not sure how to access the training. Ignore this step if you’re treating this as a non-interactive tutorial.
  2. Create a new data source called books for the library data source:

  3. Enter the following information into the creation form then click create:

    • Data source type: push2

    • Name: books

      exercise create a push index 01
      exercise create a push index 02
      exercise create a push index 03
  4. View the available Push API calls by selecting the > API-UI menu item, then selecting the Push API tab.

    exercise create a push index 04
    exercise create a push index 05
  5. Add a document to the push index. Expand the push-api-content heading then select the PUT documents item.

    exercise create a push index 06
  6. Enter the following into the parameters section of the PUT documents form:

    • collection: default~ds-books

    • key: http://books.example.com/books/test-document.html

    • content: <html><head><title>Books Test document</title></head><body><h1>Test document 1</h1><p>This is some sample text in a HTML document.</p></body></html>

    • Parameter content type: text/html

    • Content-type: text/html

    exercise create a push index 07
  7. Add the document by clicking the execute button. The screen will update with the submitted call and response

    exercise create a push index 08
  8. Run a query for test document against the book finder results page and verify that your document is returned in the search results.

    exercise create a push index 09
  9. Changes to the push index will be visible as soon as a commit completes. Push indexes will auto-commit based on configured settings, but a commit can also be manually triggered by calling the POST commit API call. This is also available under the push-api-content heading.

Splitting documents that are submitted to a push index

Push indexes don’t support the standard XML document splitting provided by Funnelback (via the XML indexing screen).

The split html and xml document plugin can be used with push indexes to provide this functionality. In addition to splitting XML, the plugin also allows you to split a HTML document based on a css selector.

The filter provided by this plugin can be chained with the JSONToXML filter to enable you to submit and split a JSON document.

If individual records are removed from a feed that is resubmitted you will need to also delete the individual items in the push index because they are indexed with the split document’s assigned URL.

Tutorial: Split JSON array content that is submitted to a push index

This tutorial shows you how to split a JSON array that is submitted to a push index and index the individual items in the array as separate result items.

The example converts the submitted JSON to XML, then splits the XML (using the HTML and XML split plugin) by chaining two filters.

  1. Download the Books JSON data from: https://docs.squiz.net/training-resources/books.json and save this locally.

  2. Log in to the search dashboard where you are doing your training.

    See: Training - search dashboard access information if you’re not sure how to access the training. Ignore this step if you’re treating this as a non-interactive tutorial.
  3. Create a search package named: Push split json example

  4. Create a results page within this search package named: 100 best books

  5. Edit the results page configuration and add the following to the query_processor_options option: -SF=[author,country,imageLink,language,link,pages,title,date], then click the save and publish button to make the configuration live.

  6. Create a push data source named top 100 books feed

  7. Enable the split html and xml plugin on the push data source. This makes the split filter available to use in the push data source.

  8. Configure the push data source for document splitting by adding the following to the data source configuration:

    • plugin.split-html-xml-filter.config.defaultXMLSplit=/json/array

  9. Configure the push data source metadata mappings. This needs to be done before you push in the content, otherwise you will need to re-index the content of the push data source by running a vacuum re-index via the push API. Clear the existing mappings, then add the following:

    Class name Source Type Search behaviour

    author

    //author

    text

    searchable as content

    country

    //country

    text

    searchable as content

    imageLink

    //imageLink

    text

    searchable as content

    language

    //language

    text

    searchable as content

    link

    //link

    text

    searchable as content

    pages

    //page

    text

    searchable as content

    title

    //title

    text

    searchable as content

    date

    //year

    text

    searchable as content

  10. View the available Push API calls by selecting the > API-UI menu item, then selecting the Push API tab.

    exercise create a push index 04
    exercise create a push index 05
  11. Add the JSON to the push index. Expand the push-api-content heading then select the PUT documents item.

    exercise create a push index 06
  12. Enter the following into the parameters section of the PUT documents form:

    • collection: default~ds-100-best-books

    • key: https://raw.githubusercontent.com/benoitvallon/100-best-books/master/books.json

    • content: Open the books.json file in a text editor and paste the contents of the file into this field.

    • Parameter content type: application/json

    • Content-type: application/json

    • filters: JSONToXML:com.funnelback.plugin.splitxmlhtmlfilter.SplitHtmlXmlFilterStringFilter This tells the push collection to run the submitted content through the JSONToXML filter then the output of that through the split html and xml filter.

  13. Observe the response body returned by the API shows that 100 documents were added to the push index.

  14. In a new browser tab, run a search against the 100 best books results page for !showall and observe that 100 results are returned.

  15. Run a search for chinua and observe the author metadata for the returned result.

  16. Return to the API screen and edit the JSON you submitted (in the content field) and update the author of the first record to Chinua Achebe XX then resubmit the API call.

  17. Rerun the search for chinua and observe that the author metadata has been updated.

  18. Rerun the search for !showall and observe that there are still 100 results in the index.

Updating the content works because the same data was submitted to the API, and the individual JSON records (after splitting) had the same URLs as the initial commit. The URLs assigned in the splitting will depend on the item number in the JSON and the key field submitted.