Adding, updating and deleting content in a push index

The content within a push index is maintained via the push API.

Always ensure you check the push API response to ensure no errors occurred. It is the responsibility of the code that calls the push API to manage errors - this includes retrying or queuing of requests and user notification.

Add or update content in a push index

The push API provides two API calls that can be used to add or update content in a push index:

  • PUT /v2/collections/{collection}/documents: This call uses a PUT request to submit a document to the push index. Metadata can be supplied via HTTP headers that are sent with the request. The request must provide a key, which is a URI or URL that identifies the document. The document will be stored using the key - it will be added if the key does not exist in your index, or it will be updated if the key exists.

  • POST /v2/collections/{collection}/documents/content-and-metadata: This call uses a POST request to submit a document to the push index. The submission must be made as a multipart form data containing a JSON object that specifies the document metadata, and the document content.

When adding XML content to a push data source you must ensure the API call includes the correct XML mime type and that the XML content submitted also includes the <?xml> declaration line, otherwise the document will be indexed as plain text content.

Push index URI/URL keys

The push API expects all keys to be valid URIs, and expects that all keys have a scheme name such as http, ftp, or local. The push API will also canonicalize keys. The canonicalized key that was used by the push API when storing the document is displayed in the response.

In general, the canonicalization:

  • adds a '/' to the end of a URI that does not have a path. e.g. http://example.com would become http://example.com/

  • removes fragments from the URI. e.g. http://example.com/#id would become http://example.com/

  • flattens paths e.g. http://example.com/path/../ would become http://example.com/

Supplying metadata

Extra metadata may be added to a document when it is submitted to the push API. For the PUT /v2/collections/{collection}/documents call, HTTP headers that start with X-Funnelback-Push-Metadata- are used to supply additional metadata.

The POST /v2/collections/{collection}/documents/content-and-metadata call uses a multipart request that supplies the metadata as a JSON object.

Metadata that is extracted from the content (such as HTML <meta> tags, or metadata mapped from XML fields) can be passed in the content as usual, and will be extracted by the indexed based on the mappings in the same manner as for any other data source.

Whatever method you use, you will still need to define a metadata mapping to access the metadata from the indexer and query processor.

As HTTP headers are case-insensitive in the HTTP specification but not the Java Servlet specification, metadata keys may be converted to lower-case in some environments. If case is important the POST /v2/collections/{collection}/documents/content-and-metadata call should be used.

This call should also be used when the metadata value can’t be passed in a HTTP header, such as metadata that includes non-ASCII characters, or metadata value containing line breaks.

A GET request for a document will return the metadata that is set using the HTTP headers, in the metadata part of the returned JSON. The metadata part of the returned JSON will not contain metadata that the indexer has extracted from the document or added with external_metadata.

Special metadata

A special header X-Funnelback-Push-Received-Time is set automatically by the Push API and contains the submission time. This contains a 19 character UTC datetime in the form: yyyyMMddHHmmss.SSSZ. This field can be mapped in the same way as other metadata.

Applying filters

If you need apply any filtering to the document then this must be set in the filters parameter that is submitted along with the request. The parameter should be set to the chain of filters you wish to run. This includes any filters that are required by enabling a plugin on your push data source.

The push API does not apply any filters by default, and filter.classes set in the push data source configuration are ignored.

Delete content from a push index

The push API provides two API calls that can be used to delete content from a push index:

  • DELETE /v2/collections/{collection}/documents can be used to delete a specific document from a push index. You need to supply the key for the document and also the ID of the push data source where this document is indexed.

  • DELETE /v2/collections/{collection} can be used to delete all documents from a push index. This call empties your index and also removes any associated click data and redirects.