Re-indexing and merging push indexes

Getting started

Push indexes are made up of many smaller indexes (known as generations), to which new and updated content is added. This means that content changes can be made and indexed very quickly, resulting in your content appearing in your search results in near-real time.

The disadvantage of this architecture is that there is a higher maintenance load on the index, and additional merge operations need to be run to consolidate the index so that it can serve results quickly.

Another disadvantage is that you need to trigger a reindexing operation if you make any changes to the configuration that controls how the index is built if you wish to see the changes applied quickly. This includes changes to metadata mappings, indexer options, gscopes and query independent evidence. If you don’t trigger a reindex changes you make will only be reflected in content that is added or updated, and won’t apply until the system consolidates/merges the indexes.

Merge and re-index options

Push indexes are merged and re-indexed by running the following API call:

  • POST /v1/collections/{collection}/vacuum

Although merging happens automatically, you may wish to manually run a vacuum operation to force a merge and apply index configuration changes across the whole index.

Merge indexes

Merging indexes serves two main purposes:

  • It optimizes the index so that it runs more efficiently - by defragmenting the index and removing content that has been flagged as deleted. Deleted content continues to exist inside the index after deletion, but won’t show in the search results.

  • It merges all the index generations into a single generation, freeing up capacity for further updates to be made to the push index. There is a limit on the number of index generations that a push data source can service, and the system will automatically merge index generations in order to prevent this limit being reached.

Calling the POST /v1/collections/{collection}/vacuum API call with a vacuum type set to MERGE will perform a merge operation on the index, and also re-apply index extras (see below).

Re-indexing

Re-indexing is used to apply changes to the index configuration to existing content within the search index.

This will commonly be required if you make a change such as:

  • Updating your metadata mappings or configuration

  • Some modifications to faceted navigation configuration (if a re-index is required)

  • Changes to indexer options, which control how the index is built.

  • Changes to gscopes, query independent evidence or result collapsing configuration.

Calling the POST /v1/collections/{collection}/vacuum API call with a vacuum type set to RE_INDEX will perform a merge and re-index operation on the index, and also re-apply index extras (see below).

This mode will completely rebuild your push index.

Re-applying index extras

Certain indexer configuration (gscopes and query independent evidence) does not require a full index rebuild to apply the changes, but a quick operation that just reapplies the gscopes and QIE configuration to the existing index.

Calling the POST /v1/collections/{collection}/vacuum API call with a vacuum type set to RE_APPLY_INDEX_EXTRAS will perform this quick operation that just applied gscope and QIE changes to your index. This operation is also run for the other update modes, but those can take a lot longer to run.