DESCRIPTION¶
Overview¶
This service is responsible for taking dataset version exports, cataloging them and transferring them to the DANS data vault. If the dataset version export is the first version of a dataset, an NBN persistent identifier is minted for the dataset and registered in the NBN database. For more information about the context of this service, see the Data Station architecture overview.
Interfaces¶
Provided¶
Inbox directories¶
- Protocol type: Shared filesystem
- Internal or external: internal
- Purpose: to receive dataset version exports from the Data Stations and other services
Admin console¶
- Protocol type: HTTP
- Internal or external: internal
- Purpose: application monitoring and management
Consumed¶
Data Vault Catalog¶
- Protocol type: HTTP
- Internal or external: internal
- Purpose: to maintain information about the datasets and their versions that are stored in the DANS data vault
NBN Database¶
- Protocol type: HTTP
- Internal or external: internal
- Purpose: to mint and register NBN persistent identifiers for datasets
Data Vault import inbox¶
- Protocol type: Shared filesystem
- Internal or external: internal
- Purpose: to import dataset version exports into the DANS data vault
Data Vault API¶
- Protocol type: HTTP
- Internal or external: internal
- Purpose: to issue commands to the DANS data vault and retrieve information from it
Processing¶
This service is best viewed as a processing pipeline for dataset version exports. A dataset version export is a zip file containing the metadata and files of a dataset version. It is created as a long term preservation copy of a dataset version. The processing pipeline consists of the following steps:
COLLECT¶
The service monitors a number of configured inbox directories for new dataset version exports. The COLLECT step is responsible for detecting new exports and creating a transfer items in the service's database.
EXTRACT-METADATA¶
The metadata of a dataset version export is extracted. An NBN persistent identifier is minted for the dataset if it is the first version of a dataset and registered in the NBN database. The metadata, including the NBN is registered in the Data Vault Catalog.
At this point, the dataset version export is ready to be transferred to the DANS data vault. It is moved the current import batch for the vault collection that this dataset version export must be added to. The import batch is a directory on the shared filesystem that is used to transfer dataset version exports to the DANS data vault. It has the following structure:
import-batch/
urn:nbn:nl:ui:13-4-abc/
1/
/unzipped contents of dataset-1-version-1.zip
2/
/unzipped contents of dataset-1-version-2.zip
urn:nbn:nl:ui:13-4-def/
1/
/unzipped contents of dataset-2-version-1.zip
Each directory in the batch is an object import directory, that is to say it targets a specific OCFL object in the DANS data vault. An OCFL object is the container for the files and metadata of a dataset. The name of the directory is the URN:NBN of the dataset. Each subdirectory of an object import directory is a version import directory. The name of the directory is an integer. The versions are imported in ascending order.
SEND-TO-DATA-VAULT¶
Each time a dataset version export is moved to the import batch, the service checks if the total size of the import batch exceeds a configured threshold. If it does, the service sends a command to the Data Vault API to start the import of the import batch (after first creating a new empty import batch for subsequent dataset version exports).
After confirmation from the Data Vault API that the import has finished successfully, the service checks the size of the current top layer in the DANS Data Vault. If it exceeds a configured threshold, the service sends a command to the Data Vault API to create a new top layer.
CONFIRM-ARCHIVED¶
When the request has been sent to archive the import batch, the service waits for the Data Vault API to confirm that the import batch has been archived. It then
updates the status of all the dataset version exports in the import batch to ARCHIVED
in the Vault Catalog.