All results endpoint

The all-results endpoint allows you to fetch and stream back all results of a query as JSON or CSV data in a single call.

This endpoint is not intended as a source for search results to present to end users, but as a way to export search results, or as a source of data for other systems which may need to perform further processing on a large set of results. For such systems, this endpoint avoids the need to paginate through the result set or use large num_ranks values which may exhaust the memory available to the server.

It also means that the all-results response only contains the result records and no additional information.

This differs from the standard search HTML, JSON and XML endpoints in a number of ways:

  • By default, all matching results are returned.

  • Only the list of results is returned from the data model. The response does not include any of the question object or other parts of the response (such as result summary information or other features like faceted navigation).

  • You can pick which fields of the result model should be sent back e.g. you can select to receive only the liveUrl and title.

  • You can request either JSON or CSV (RFC 4180).

  • The default query processor options have been altered to speed up processing. Any query processor options that you wish to change must be specified as request parameters when you call the all results endpoint. The query_processor_options from your results page configuration is not used by the all-results endpoint.

Location

The all-results JSON endpoint is:

/s/all-results.json

and the all-results CSV endpoint is:

/s/all-results.csv

Using the all-results endpoint

Like the standard search JSON endpoint, you will need to set a query or s (system query) and collection request parameters. For example to get all results from the sp~example collection we could set:

/s/all-results.csv?collection=sp~example&query=!FunDoesNotExist:PadreNull

or

/s/all-results.csv?collection=sp~example&s=!FunDoesNotExist:PadreNull
the query set is a special which will result in all documents being returned. It works by requesting every document that does not have the term PadreNull in the reserved and never defined metadata class FunDoesNotExist.

By default, this will return a CSV file of every URL in the collection for example:

URL
https://example.com/
https://example.com/example
https://example.com/examples

Request parameters

This endpoint accepts most parameters that can be set on /s/search.json. The endpoint also supports the following options:

fields

This option lets you pick, using a comma separated XPath style expression, which fields out of the Result model to be returned, by default this is set to &fields=liveUrl. To get the liveUrl, title and metadata author set this to: &fields=liveUrl,title,listMetadata/author. To view everything that is available set fields=/. See JXPath documentation for the syntax of the XPaths.

To include any metadata, the SM and SF parameters (e.g. &SM=both&SF=[author]) must be set within the URL so that the metadata values (e.g. author) are returned by the query processor and included in the result model. SM and SF settings from the query_processor_options collection config setting are not applied by the all results endpoint.
fieldnames

This option lets you set the names of the fields. In the CSV example above we set the name of liveUrl to URL this is done with the default being &fieldnames=URL. To rename the fields set in the previous example liveUrl, title and metadata author to URL, Title and Author set this to &fieldnames=URL,Title,Author. Note that the order of this parameter must match that of the fields parameter.

optimisations

It is possible to turn off the default optimizations that have been applied by setting optimisations=false. This is likely to lead to poor performance of the API instead it is better to overwrite any query processor option by setting it in the request URL. For example to turn summaries back on set &SBL=250.

num_ranks

This sets the number of result to return, by default this is set to the highest possible value to get all results. It may be set on the request URL to limit the number of results returned for example to fetch only one hundred results the parameter could be set to &num_ranks=100.

start_rank

This sets the offset of the first result to return, by default this is set to 1. It may be set to other values on the request URL for example to start at the one hundredth result the parameter would be set to start_rank=101.

fileName

Sets the name of the file the browser should save the response of this API to. Causes the Content-Disposition header to be set to attachment; filename="name" where name is the value of this URL parameter. For example all-results.csv?collection=funnelback_documentation&query=%21padrenull&fileName=user_data.csv will return a file with the name user_data.csv.

header

(all-results.csv only) This option can be used to disable returning of the header row. The header line that outlines the name of each field will not be returned when set to false. By default this is set true.

Comma-separated parameter format.

The parameters fields and fieldnames are comma separated lists which follow the RFC 4180 standard for escaping for example to name a field "foo,bar" the format would be """foo,bar""".

The values written to the CSV fields are the result of calling toString() on the object that is referenced by fields. It is recommended that fields reference only primitive (int, float) or their object wrapper classes (Integer, Float) as well as String and Date objects. Other objects may not have consistent or reliable toString behavior. If you need data from a more complex object, such as a Map, use a groovy script to convert the object into a String in the desired format. You can place this String into the customData map of the result.

Custom response data

This feature is not available in the Squiz DXP.
Equivalent functionality is available using plugins.

Unlike the standard HTML search endpoint (search.html) the all-results endpoint does not use Freemarker template files to customize the returned response. To customize the data that is returned a hook script can be used to add data to the Result model customData map, values from this map can be fetched in a similar way to how metadata is accessed. The search transaction question type is set to SEARCH_GET_ALL_RESULTS, this allows hook scripts to only run when this end point is called for example the groovy script may have:

if (com.funnelback.publicui.search.model.transaction.SearchQuestion.SearchQuestionType.SEARCH_GET_ALL_RESULTS.equals(transaction.question.questionType)) {
    // Code to run when the all-results endpoint is hit.
}

Note that the end point may call the groovy script multiple times, each time with only a subset of the results that will be returned.

Default optimizations

The default optimizations applied to the all-results endpoint set the following query processor options:

Disable presentation features

  • SBL=1: set minimum summary buffer size

  • sort=: disable result sorting

  • SF=: remove all summary fields

  • SM=off: disable result summary

Turn off various features

  • bb=off: disable legacy best bets

  • collapsing=off: disable results collapsing

  • contextual_navigation=false and cnto=0.001: disable contextual navigation

  • explain=false: disable ranking explain mode

  • QL=0: disable quick links

  • qsup=off: disable query blending

  • stem=0: disable word stemming

Turn off various result count and range calculations

  • countgbits=: disable gscope counts

  • countIndexedTerms=: disable indexed term counts

  • count_dates=: disable date counts

  • countUniqueByGroup=: disable unique metadata counts

  • count_urls=: disable URL counts

  • docs_per_collection=false: disable docs per collection counts

  • rmcf=: disable metadata counts

  • rmrf=: disable metadata range counts

  • geospatial_ranges=false: disable geospatial distance calculations

  • sum=, sumByGroup=: disable numeric metadata sum calculations

Disable ranking features

  • cool=off: disable cooler ranking engine

  • kmod=0: use normal scoring for special fields

  • promote_urls= = Disable curator URL promotion

  • sco=1: disable document scoring

  • SSS=0, same_collection_suppression=0, , same_meta_suppression=0, neardup=1, title_dup_factor=1: disable same site suppression.

  • service_volume=: unset service volume settings

  • daat_timeout=0: set minimum daat timeout