Web archive (WARC)
This component is currently in the Beta phase of development. We encourage users to submit feedback, and we will be prioritizing fixes should any issues be encountered. |
This is a Premium connector. Only customers with a subscription to the web archiving application will be supplied with the required credentials by the Integrations team. |
Credentials
- Name
-
The name of your credential
- URL
-
The URL of the SQS Queue for this customer’s tenant. This will be provided by the Integrations team when the tenant is deployed.
- Access key ID
-
The ID of the AWS access key for the tenant. This will be provided by the Integrations team when the tenant is deployed. Do not share this with anyone outside of the Integrations and Implementation teams.
- Secret access key
-
The secret access key for the tenant. This will be provided by the Integrations team when they deploy the tenant. Do not share this with anyone outside of the Integrations and Implementation teams.
- Region
-
The AWS region in which the tenant is deployed. This can be derived from the Queue URL. For example;
ap-southeast-2
- Client name
-
The name of the client. This will be used across the tenant and the Integrations instance and should be agreed upon with the Integrations team before they deploy the tenant.
Actions
- Send message
-
The only Action available to the component. This will send a message to the customer’s tenant’s SQS queue containing the data required to initialize the archiving process for a single page.
Input fields
An asterisk (*) denotes a required field.
- Callback flow ID*
-
The ID of the flow #3 in your customer’s workspace. This is the flow to which each .warc file will be sent for Integrations to forward to the customer’s chosen cloud storage solution.
- URL to be archived*
-
The full customer-facing URL of the page that the archive process will visit to generate the
.warc
file. - Last modified date of the page
-
This can be configured when setting up the Matrix trigger.
- Matrix asset ID of the page
-
This can be configured when setting up the Matrix trigger.
- Additional data
-
Include any additional data here as a JSON object. This data will be added to the
.warc
file. Examples of additional data aredescription
orcollection
.See the
.json
object below for an example:{"lastmodby": "editor", "description": "employment", "collection": "Archive Test Website", "disableJS": true }
The additional data JSON object is configured by default to accept the
disableJS
parameter.If this is set to true, the archive process will disable JavaScript when it visits the page to archive. This can reduce the size of the generated
.warc
file if the page is JavaScript-heavy.Avoid manually setting this value. Allow the flow to extract it from the Webhook data. Steps 4 and 7 of flow #3 will automatically detect a
.warc
file that is too large and attempt to re-trigger flow #2 with the DisableJS value set to true.The default value of false will be used if the field is not included, meaning JavaScript will be enabled for the archive.
Output
The action will output a JSON representation of the result of calling the SQS API to add a message to the queue. If successful, it will emit something similar to the following:
{
"ResponseMetadata": {
"RequestId": "351785b3-9d45-5639-8b26-ded1ed7ca8b6"
},
"MD5OfMessageBody": "f20eee3dbb643b7486c2da434467d7a6",
"MD5OfMessageAttributes": "7a35f1c8c293a704405f778be2dcbf19",
"MessageId": "6fbd9555-d8a4-41f8-9af0-f490ddb2c86e"
}
If unsuccessful, it will emit an error message.
Running the sample data process for this action will send a real message to the SQS Queue and thus trigger a page archive. |