All results endpoint
The all-results endpoint allows you to fetch and stream back all results of a query as JSON or CSV data in a single call.
This endpoint is not intended as a source for search results to present to end users, but as a way to export search results, or as a source of data for other systems which may need to perform further processing on a large set of results. For such systems, this endpoint avoids the need to paginate through the result set or use large num_ranks
values which may exhaust the memory available to the server.
It also means that the all-results response only contains the result records and no additional information.
-
By default, all matching results are returned.
-
Only the list of results is returned from the data model. The response does not include any of the question object or other parts of the response (such as result summary information or other features like faceted navigation).
-
You can pick which fields of the result model should be sent back e.g. you can select to receive only the
liveUrl
andtitle
. -
You can request either JSON or CSV (RFC 4180).
-
The default query processor options have been altered to speed up processing. Any query processor options that you wish to change must be specified as request parameters when you call the all results endpoint. The
query_processor_options
from your results page configuration is not used by the all-results endpoint.
Location
The all-results JSON endpoint is:
/s/all-results.json
and the all-results CSV endpoint is:
/s/all-results.csv
Using the all-results endpoint
Like the standard search JSON endpoint, you will need to set a query
or s
(system query) and collection
request parameters. For example to get all results from the sp~example
collection we could set:
/s/all-results.csv?collection=sp~example&query=!FunDoesNotExist:PadreNull
or
/s/all-results.csv?collection=sp~example&s=!FunDoesNotExist:PadreNull
the query set is a special which will result in all documents being returned. It works by requesting every document that does not have the term PadreNull in the reserved and never defined metadata class FunDoesNotExist .
|
By default, this will return a CSV file of every URL in the collection for example:
URL
https://example.com/
https://example.com/example
https://example.com/examples
Request parameters
This endpoint accepts most parameters that can be set on /s/search.json. The endpoint also supports the following options:
fields
-
This option lets you pick, using a comma separated XPath style expression, which fields out of the Result model to be returned, by default this is set to
&fields=liveUrl
. To get the liveUrl, title and metadataauthor
set this to:&fields=liveUrl,title,listMetadata/author
. To view everything that is available setfields=/
. See JXPath documentation for the syntax of the XPaths.To include any metadata, the SM
andSF
parameters (e.g.&SM=both&SF=[author]
) must be set within the URL so that the metadata values (e.g.author
) are returned by the query processor and included in the result model.SM
andSF
settings from thequery_processor_options
collection config setting are not applied by the all results endpoint. fieldnames
-
This option lets you set the names of the fields. In the CSV example above we set the name of
liveUrl
toURL
this is done with the default being&fieldnames=URL
. To rename the fields set in the previous example liveUrl, title and metadataauthor
to URL, Title and Author set this to&fieldnames=URL,Title,Author
. Note that the order of this parameter must match that of the fields parameter. optimisations
-
It is possible to turn off the default optimizations that have been applied by setting
optimisations=false
. This is likely to lead to poor performance of the API instead it is better to overwrite any query processor option by setting it in the request URL. For example to turn summaries back on set&SBL=250
. num_ranks
-
This sets the number of result to return, by default this is set to the highest possible value to get all results. It may be set on the request URL to limit the number of results returned for example to fetch only one hundred results the parameter could be set to
&num_ranks=100
. start_rank
-
This sets the offset of the first result to return, by default this is set to
1
. It may be set to other values on the request URL for example to start at the one hundredth result the parameter would be set tostart_rank=101
. fileName
-
Sets the name of the file the browser should save the response of this API to. Causes the
Content-Disposition
header to be set toattachment; filename="name"
wherename
is the value of this URL parameter. For exampleall-results.csv?collection=funnelback_documentation&query=%21padrenull&fileName=user_data.csv
will return a file with the nameuser_data.csv
. header
-
(
all-results.csv
only) This option can be used to disable returning of the header row. The header line that outlines the name of each field will not be returned when set tofalse
. By default this is settrue
.
Comma-separated parameter format.
The parameters fields
and fieldnames
are comma separated lists which follow the RFC 4180 standard for escaping for example to name a field "foo,bar"
the format would be """foo,bar"""
.
The values written to the CSV fields are the result of calling toString()
on the object that is referenced by fields
. It is recommended that fields
reference only primitive (int, float) or their object wrapper classes (Integer, Float) as well as String
and Date
objects. Other objects may not have consistent or reliable toString
behavior. If you need data from a more complex object, such as a Map
, use a groovy script to convert the object into a String
in the desired format. You can place this String
into the customData
map of the result.
Custom response data
This feature is not available in the Squiz DXP. |
Equivalent functionality is available using plugins. |
Unlike the standard HTML search endpoint (search.html
) the all-results endpoint does not use Freemarker template files to customize the returned response. To customize the data that is returned a hook script can be used to add data to the Result model customData map, values from this map can be fetched in a similar way to how metadata is accessed. The search transaction question type is set to SEARCH_GET_ALL_RESULTS
, this allows hook scripts to only run when this end point is called for example the groovy script may have:
if (com.funnelback.publicui.search.model.transaction.SearchQuestion.SearchQuestionType.SEARCH_GET_ALL_RESULTS.equals(transaction.question.questionType)) {
// Code to run when the all-results endpoint is hit.
}
Note that the end point may call the groovy script multiple times, each time with only a subset of the results that will be returned.
Default optimizations
The default optimizations applied to the all-results endpoint set the following query processor options:
Disable presentation features
-
SBL=1
: set minimum summary buffer size -
sort=
: disable result sorting -
SF=
: remove all summary fields -
SM=off
: disable result summary
Turn off various features
-
bb=off
: disable legacy best bets -
collapsing=off
: disable results collapsing -
contextual_navigation=false
andcnto=0.001
: disable contextual navigation -
explain=false
: disable ranking explain mode -
QL=0
: disable quick links -
qsup=off
: disable query blending -
stem=0
: disable word stemming
Turn off various result count and range calculations
-
countgbits=
: disable gscope counts -
countIndexedTerms=
: disable indexed term counts -
count_dates=
: disable date counts -
countUniqueByGroup=
: disable unique metadata counts -
count_urls=
: disable URL counts -
docs_per_collection=false
: disable docs per collection counts -
rmcf=
: disable metadata counts -
rmrf=
: disable metadata range counts -
geospatial_ranges=false
: disable geospatial distance calculations -
sum=
,sumByGroup=
: disable numeric metadata sum calculations
Disable ranking features
-
cool=off
: disable cooler ranking engine -
kmod=0
: use normal scoring for special fields -
promote_urls=
= Disable curator URL promotion -
sco=1
: disable document scoring -
SSS=0
,same_collection_suppression=0
, ,same_meta_suppression=0
,neardup=1
,title_dup_factor=1
: disable same site suppression. -
service_volume=
: unset service volume settings -
daat_timeout=0
: set minimum daat timeout