Importing
Datastore offers an importer to quickly make many changes to the documents in your data service.
The importer can create and update documents within a collection with the property values you provide. There is also no requirement to ensure the properties provided are consistent from document to document within an import providing the freedom to update what you need.
Using the importer
The Datastore importer is available to import data into your data service.
The importer must be configured and primed with import data. It runs on the server, avoiding the network request time associated with standard data modifying requests. It also eliminates the overhead of checking the blueprint-defined collection and document ACL rules as the importer is expected to run only at the request of trusted data administrators.
When the importer runs, it can create or update documents depending on the document’s existence and how the import has been configured. Changes occur instantly as the importer progresses through the import data. Changes can only be undone by standard Datastore requests or by running another import configured with inverted import data.
The import feature is accessible by API at https://YOUR-DATA-SERVICE.datastore.squiz.cloud/__resources/imports and accepts the ndjson
file format, which maximizes flexibility in what data changes from document to document.
It is a unique endpoint available automatically and does not need to be configured in your blueprint.
The importer is not available in the Simulator.
Authorization
The https://YOUR-DATA-SERVICE.datastore.squiz.cloud/__resources/imports is restricted. Imports are presumed to be run only by trusted data administrators of the data service, which enables Datastore to skip checking ACL statements for every data import action.
To make authorized requests to this endpoint, a JWT encrypted with the secret key of the data service must be supplied.
The JWT must have the audience value set to datastore.developers
, and if an expiry value is set, the JWT should not be expired.
The following shows a payload for a JWT with an expiry value and audience suitable for import:
{
“exp”: 1660638385
"aud": "datastore.developers"
}
Import strategies
An import strategy is the list of operations for which an import has been configured.
The operations available are create
and update
.
The create strategy will create a document only if it does not exist.
An import will check if a document exists before creation if, in the import data, the property referencing the system property documentid
exists.
If the document does exist, the import will skip the document unless the update import strategy is also set.
The update strategy will update the document only if the document exists.
With these two strategies, your import can be configured to do the following:
-
Create documents only
-
Update documents only
-
Create or update documents
Import status workflow
Each import has a status depending on what stage it is up to in its progress. The following list shows the statuses an import can have.
- Configuring
-
This is the initial status an import gets at creation.
- Started
-
This is the status to set to start and import. An import will have this status until there is room in the Datastore environment for it to start running.
- Running
-
This status an import will have once it is running within the Datastore environment.
- Paused
-
This is the status to set if you wish to pause your import temporarily. You might also see this status temporarily if the Datastore environment is being upgraded.
- Resumed
-
This is the status to set if you wish to resume your import. You might see this status temporarily if the Datastore environment is being upgraded. Once there is room in the Datastore environment, the import will return to having a running status.
- Canceled
-
This is the status to set if you wish to cancel your import. This will permanently stop it from running.
- Complete
-
This is the status the import will have after all data set for import has been processed.
- Failed
-
This is the status an import will have if an import has critically failed several times.
Import data format and blocks
Imports are configured with import data. That data takes the form of NDJSON files up to an individual file size of 20MB. This limit keeps the import API functioning quickly and allows the import to run predictably regardless of the total data size to be imported. A single import can be configured with multiple data blocks allowing for importing more than 20MB of data.
The NDJSON
file format maximizes flexibility in what data changes from document to document and can be processed line by line quickly.
Each line equates to a document to process in the import.
The line forms a JSON object where the indexes are the property names, and their values are the data to import.
If you wish to update existing documents or choose not to create them if they exist, then the documentid property must be included as one of the properties in the object.
The following is an example of two documents for import in an NDJSON data block:
{"documentid":"1","name":"John Doe","address":"1, The Street, Someplace, Somewhere", "phone":"1-555-234-5678"}
{"documentid":"2","name":"Jane Doe","address":"2, The Avenue, Anyplace, Anywhere", "phone":"1-555-876-5432"}
Once created, an import block cannot be modified or deleted as an auditing feature. If you need to change the import data for an import, then delete the import and create a new one.
Listing imports
Listing imports is done by making a GET request to the imports endpoint.
Imports are listed from newest to oldest, with a maximum of 1000 imports per page.
By default, the first page is returned, and the page shown can be controlled by including page
and the page number in the query string.
This example uses CURL:
curl -X GET \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $token" \
--silent -fS https://YOUR-DATA-SERVICE.datastore.squiz.cloud/__resources/imports?page=2
The response to this request might look like this:
{
"data": [
{
"importid": 570,
"strategy": [
"create",
"update"
],
"collection": "/contacts",
"status": "configuring",
"createdDatetime": "2022-08-16 06:26:55.98514+00"
},
{
"importid": 569,
"strategy": [
"update"
],
"collection": "/assignments",
"status": "complete",
"createdDatetime": "2022-08-15 08:36:50.27154+00"
}
]
}
Response codes
Collection requests return the following HTTP response code:
HTTP Method | HTTP code | Notes |
---|---|---|
GET |
200 OK |
Returned when authorized to get imports. |
400 Bad Request |
Returned when in the POST body the |
|
Creating an import
Create an import by making a POST request to the imports endpoint.
The strategy array and collection are required in the JSON request body.
Strategy is explored further in the Import strategies section.
The object key collection
refers to the collection into which the data will import.
It must be preceded with a forward slash and be a valid collection or subcollection within your data service.
The following is an example of import configuration to be POST’ed in a create import request:
{
"strategy": [
"create",
"update"
],
"collection": "/contacts"
}
This example uses CURL to create an import with the above import configuration:
curl -X POST \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $token" \
--silent -fS -d “$importConfig” https://YOUR-DATA-SERVICE.datastore.squiz.cloud/__resources/imports
The response to this request might look like this:
{
"importid": 570,
"strategy": [
"create",
"update"
],
"collection": "/contacts",
"status": "configuring",
"percentComplete": 0,
"createdDatetime": "2022-08-16 06:26:55.98514+00",
"startedDatetime": null,
"ranDatetime": null,
"endedDatetime": null,
"failureCount": 0,
"createdDocuments": 0,
"updatedDocuments": 0,
"deletedDocuments": 0,
"skippedDocuments": 0,
"blockCount": 0
}
Response codes
Collection requests return the following HTTP response code:
HTTP Method | HTTP code | Notes |
---|---|---|
GET |
200 OK |
Returned when an import is successfully created. |
400 Bad Request |
Returned when the |
|
400 Bad Request |
Returned when the collection or sub-collection does not exist within your data service. |
|
Getting an import
Get imports by making GET requests to an import’s endpoint, as identified by importid
.
This endpoint can help monitor an individual import’s progress.
This example uses CURL to get the details of import with the ID 570:
curl -X GET \
-H "Authorization: Bearer $token" --silent \
-fS https://YOUR-DATA-SERVICE.datastore.squiz.cloud/__resources/imports/570
The response to this request might look like this:
{
"importid": 570,
"strategy": [
"update"
],
"collection": "/contacts",
"status": "configuring",
"percentComplete": 0,
"createdDatetime": "2022-08-16 06:26:55.98514+00",
"startedDatetime": null,
"ranDatetime": null,
"endedDatetime": null,
"failureCount": 0,
"createdDocuments": 0,
"updatedDocuments": 0,
"deletedDocuments": 0,
"skippedDocuments": 0,
"blockCount": 0
}
Response codes
Collection requests return the following HTTP response code:
HTTP Method | HTTP code | Notes |
---|---|---|
GET |
200 OK |
Returned when authorized to get the import. |
400 Bad Request |
Returned when a non-positive integer is provided for the import ID. |
[source.copy,json] ---- { "title": "Invalid import ID", "status ": "400", "invalid-params": [ { "name": "Import ID", "reason": "Import ID must be a positive integer" } ] } ---- |
404 Not Found |
Returned when the import to get is not found. |
[source.copy,json] ---- { "title": "Not Found", "status ": "404", } ---- |
Updating an import
Update imports by making PATCH requests to the import’s endpoint, as identified by importid
.
Only those properties of an import you wish to change must be included in this request.
This endpoint allows updating an import’s configuration and enables the ability to start, pause, resume and cancel an import.
Read the Import status workflow section for more information about import statuses.
The following is an example of the body of a PATCH request to start an import:
{
"status": "started"
}
This example uses CURL to patch import with ID 570:
curl -X PATCH \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $token" --silent \
-fS -d “$importConfig” https://YOUR-DATA-SERVICE.datastore.squiz.cloud/__resources/imports/570
The response to this request might look like this:
{
"importid": 570,
"strategy": [
"create",
"update"
],
"collection": "/contacts",
"status": "started",
"percentComplete": 0,
"createdDatetime": "2022-08-16 06:26:55.98514+00",
"startedDatetime": null,
"ranDatetime": null,
"endedDatetime": null,
"failureCount": 0,
"createdDocuments": 0,
"updatedDocuments": 0,
"deletedDocuments": 0,
"skippedDocuments": 0,
"blockCount": 0
}
Response codes
Collection requests return the following HTTP response code:
HTTP Method | HTTP code | Notes |
---|---|---|
GET |
200 OK |
Returned when the import is successfully updated. |
400 Bad Request |
Returned when a non-positive integer is provided for the import ID. |
|
400 Bad Request |
Returned when the |
|
400 Bad Request |
Returned when the collection or sub-collection does not exist within your data service. |
|
404 Not Found |
Returned when the import to update is not found. |
|
Deleting an import
Delete imports by making DELETE requests to the import’s endpoint identified by importid
.
This example uses CURL to patch import with ID 570:
curl -X DELETE \
-H "Authorization: Bearer $token" --silent \
-fS https://YOUR-DATA-SERVICE.datastore.squiz.cloud/__resources/imports/570
Response codes
Collection requests return the following HTTP response code:
HTTP Method | HTTP code | Notes |
---|---|---|
GET |
204 No content |
Returned when the import is successfully deleted. |
400 Bad Request |
Returned when a non-positive integer is provided for the import ID. |
|
404 Not Found |
Returned when the import to delete is not found. |
|
Listing an import’s blocks
List an import’s blocks by making a GET request to the import’s blocks endpoint.
Import blocks are listed from blockid
1
to infinity, with a maximum of 1000 blocks listed per page.
By default, the first page is returned, and the page shown can be controlled by including page
and the page number in the query string.
Blocks are numbered naturally for ease of use.
This example uses CURL:
curl -X GET \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $token" --silent \
-fS https://YOUR-DATA-SERVICE.datastore.squiz.cloud/__resources/imports/570/blocks?page=2
The response to this request might look like this:
{
"data": [
{
"blockid": 1001
},
{
"blockid": 1002
},
{
"blockid": 1003
}
]
}
Response codes
Collection requests return the following HTTP response code:
HTTP Method | HTTP code | Notes |
---|---|---|
GET |
200 OK |
Returned when authorized to get imports. |
400 Bad Request |
Returned when a non-positive integer is provided for the import ID. |
|
404 Not Found |
Returned when the import to get its block from is not found. |
|
Adding an import block
Add an import block by making a POST request to the individual import’s endpoint. The body of this request is NDJSON data.
Read the Import data format and blocks section for more information.
The content type is not application/json but rather application/x-ndjson .
|
This example uses CURL to create an import with the above import configuration:
curl -X POST \
-H "Content-Type: application/x-ndjson" \
-H "Authorization: Bearer $token" --silent \
-fS -d “$importBlock” https://YOUR-DATA-SERVICE.datastore.squiz.cloud/__resources/imports/570/blocks
The response to this request might look like this:
{
"blockid": 1
}
Response codes
Collection requests return the following HTTP response code:
HTTP Method | HTTP code | Notes |
---|---|---|
GET |
200 OK |
Returned when an import data block is successfully created. |
|400 Bad Request |Returned when a non-positive integer is provided for the import ID. a|
{
"title": "Invalid import ID",
"status ": "400",
"invalid-params": [
{
"name": "Import ID",
"reason": "Import ID must be a positive integer"
}
]
}
|400 Bad Request |Returned when the content type of the request is not application/x-ndjson. a|
{
"title": "Invalid content-type",
"status ": "400",
"invalid-params": [
{
"name": "Content-type",
"reason": "Expected content-type: application/x-ndjson"
}
]
}
|400 Bad Request |Returned when the block size is larger than 20MB. a|
{
"title": "Import block too large",
"status ": "400",
"invalid-params": [
{
"name": "Import block",
"reason": "Import block size must not exceed 20MB"
}
]
}
|404 Not Found |Returned when the import to get its block from is not found. a|
{
"title": "Not Found",
"status ": "404",
}
Getting an import data block
Get an import’s data block by making a GET request to an import’s block’s endpoint, as identified by importid
and blockid
.
This endpoint can help check what data was provided to the import.
The response content type of this request is application/x-ndjson
.
This example uses CURL to get the block data of block 1 in import with ID 570:
curl -X GET \
-H "Authorization: Bearer $token" --silent \
-fS https://YOUR-DATA-SERVICE.datastore.squiz.cloud/__resources/imports/570/blocks/1
The response to this request might look like this:
{"documentid":"1","name":"John Doe","address":"1, The Street, Someplace, Somewhere", "phone":"1-555-234-5678"}
{"documentid":"2","name":"Jane Doe","address":"2, The Avenue, Anyplace, Anywhere", "phone":"1-555-876-5432"}
Response codes
Collection requests return the following HTTP response code:
HTTP Method | HTTP code | Notes |
---|---|---|
GET |
200 OK |
Returned when authorized to get the import. |
400 Bad Request |
Returned when a non-positive integer is provided for the import ID. |
|
404 Not Found |
Returned when the import to get is not found. |
|
404 Not Found |
Returned when the import block to get is not found. |
|