All results endpoint
Introduction
This is an API which allows for fetching all results of a query and streaming all results back to the caller in a single call. This is similar to /s/search.json however with the following key differences:
-
Only the list of Results is returned from the data model.
-
By default all matching results are returned.
-
You can pick which fields of the Result model shoud be sent back e.g. you can select to receive only the liveUrl and title.
-
You can request either JSON or CSV (RFC 4180).
-
Some options have been altered to speed up processing for example options which do counting over the result set as well as ranking options.
Generally this endpoint is not intended as a source for search results to present to end users, but rather as a source of data for other systems which may need to perform further processing on a large set of results. For such systems, this endpoint avoids the need to paginate through the result set or use large num_ranks values which may exhaust the memory available to the server.
Location
The JSON endpoint is:
/s/all-results.json
and the CSV endpoint is:
/s/all-results.csv
These endpoints are available under the same locations as other modern UI search endpoints such as /s/search.json.
Getting started
Like /s/search.json you will need to set a query
and collection
URL parameter for example to get all results from the foo
collection we could set:
/s/all-results.csv?collection=foo&query=!FunDoesNotExist:PadreNull
the query set is a special which will result in all documents being returned. It works by requesting every document that does not have the term PadreNull in the reserved and never defined metadata class FunDoesNotExist .
|
By default this will return a CSV file of every URL in the collection for example:
URL https://example.com/ https://example.com/example https://example.com/examples
Parameters
This endpoint accepts most parameters that can be set on /s/search.json. The endpoint also supports the following options:
-
fields: This option lets you pick, using a comma separated XPath style expression, which fields out of the Result model to be returned, by default this is set to
&fields=liveUrl
. To get the liveUrl, title and metadataauthor
set this to:&fields=liveUrl,title,listMetadata/author
. To view everything that is available setfields=/
. See JXPath documentation for the syntax of the XPaths. -
-
Note that to include any metadata, the
SM
andSF
parameters (e.g.&SM=both&SF=[author]
) must be set within the URL so that the metadata values (e.g.author
) are returned by the query processor and included in the result model.SM
andSF
settings from thequery_processor_options
collection config setting are not applied by the all results endpoint.
-
-
fieldnames: This option lets you set the names of the fields. In the CSV example above we set the name of
liveUrl
toURL
this is done with the default being&fieldnames=URL
. To rename the fields set in the previous example liveUrl, title and metadataauthor
to URL, Title and Author set this to&fieldnames=URL,Title,Author
. Note that the order of this parameter must match that of the fields parameter. -
optimisations: It is possible to turn off the default optimisations that have been applied by setting
optimisations=false
. This is likely to lead to poor performance of the API instead it is better to overwrite any query processor option by setting it in the request URL. For example to turn summaries back on set&SBL=250
. -
num_ranks: This sets the number of result to return, by default this is set to the highest possible value to get all results. It may be set on the request URL to limit the number of results returned for example to fetch only one hundred results the parameter could be set to
&num_ranks=100
. -
start_rank: This sets the offset of the first result to return, by default this is set to
1
. It may be set to other values on the request URL for example to start at the one hundredth result the parameter would be set tostart_rank=101
. -
fileName: Sets the name of the file the browser should save the response of this API to. Causes the
Content-Disposition
header to be set toattachment; filename="name"
wherename
is the value of this URL paramater. For exampleall-results.csv?collection=funnelback_documentation&query=%21padrenull&fileName=user_data.csv
will return a file with the nameuser_data.csv
. -
header: For formats that have a header, this option can be used to disable returning of the header if supported. Currently only CSV supports this option. Other formats may be supported in the future. For CSV the header line that outlines the name of each field will not be returned when set
false
. By default this is settrue
Comma separated parameter format.
The parameters fields
and fieldnames
are comma separated lists which follow the RFC 4180 standard for escaping for example to name a field "foo,bar"
the format would be """foo,bar"""
.
The values written to the CSV fields are the result of calling toString()
on the object that is referenced by fields
. It is recommended that fields
reference only primitive (int, float) or their object wrapper classes (Integer, Float) as well as String
and Date
objects. Other objects may not have consistent or reliable toString
behavior. If you need data from a more complex object, such as a Map
, use a groovy script to convert the object into a String
in the desired format. You can place this String
into the customData
map of the result.
Custom data
Unlike /s/search.html this endpoint does not use Freemarker form files to customise the returned response. To customise the data that is returned a hook script can be used to add data to the Result model customData map, values from this map can be fetched in a similar way metadata is accessed. The search transaction question type is set to SEARCH_GET_ALL_RESULTS
, this allows hook scripts to only run when this end point is called for example the groovy script may have:
if (com.funnelback.publicui.search.model.transaction.SearchQuestion.SearchQuestionType.SEARCH_GET_ALL_RESULTS.equals(transaction.question.questionType)) {
// Code to run when the all-results endpoint is hit.
}
Note that the end point may call the groovy script multiple times, each time with only a subset of the results that will be returned.