Data organization within collections and documents

One of Datastore’s key features is the ability to create your own structures (defined by the API specifications and data models that make up your blueprints) to store data, along with the methods used to query data in these structures.

Datastore stores your actual data as properties in objects called documents, where groups of related documents are organized into parent objects called collections.

These collections and their documents are analogous to directories and files (respectively) on a file system.

Datastore implements two standards for blueprints:

  • OpenAPI - an initiative to standardize how RESTful APIs are described. This standard is used to create your API specification that defines the API endpoints and the HTTP methods they support. These files are typically defined in YAML format.

  • JSON Schema - as documented on this site, this is a vocabulary for annotating and validating JSON documents. This standard is used to define the sets of properties (or data models) that each endpoint accepts through requests (to access each property’s data values), as well as their responses. Refer to the web site for a brief summary of this syntax.

How collections and documents are organized

From an organizational perspective:

  • A document must always be placed inside a collection.

  • A collection must be created at the root level, from which multiple documents can be created and accessed. (Documents cannot be created at the root level.)

  • A document may contain a collection, which is known as a sub-collection since it is not a collection at the root level.

  • Multiple collections can be created at the root level.

For example, a collection called comments could be used to store comments submitted through an application, while each comment could have a sub-collection called replies used to store the replies to each comment.

This creates a storage structure like the following:

Example storage structure for comments and their respective replies

This alternating collection > document > collection > etc. data structure pattern is used to construct the API endpoint URLs within Datastore, in the form /collection/document/collection/…​, where either a collection or unique document identifier (ID) name is used at each level of the URL for a given API call.

Within a URL, the collection (for comments and their replies) can retain its user-friendly name as this name only needs to be unique at a given level of the URL.

The documents (i.e. the actual comments and replies themselves) use a unique ID that Datastore assigns to them. While this document name only needs to be unique within a given collection, Datastore allocates a version 4 universally unique identifier (UUID) for each document name. A UUID generated by Datastore is a 36 character string separated into 5 groups using hyphens, for example: 0cad4697-8422-47c0-9d4b-5f16d7f5baf3.

Following on from the comment and reply example above, the following URL paths can be generated:

Example of comment and reply endpoint URL paths (with truncated UUIDs)
Since a document name only needs to be unique within a given collection, your application can assign a unique identifier to a document instead of one allocated by Datastore. This allows you to create friendlier URLs with more memorable document identifiers.