dd-vault-ingest-flow¶
Service that processes deposits converting them to RDA compliant bags and sends them to the vault.
SYNOPSIS¶
dd-vault-ingest-flow { server | check }
DESCRIPTION¶
This service is part of the DANS Vault Service. It is responsible for processing deposits, converting them to RDA compliant bags and sending them to the DANS Data Vault. This service resembles dd-ingest-flow in that it processes deposits and that these deposits eventually end up in the Vault. However, this service differs from dd-ingest-flow in that it does not create a Dataverse dataset version for each deposit, relying on Dataverse to automatically export the dataset version to and RDA compliant bag. Instead, this service creates the RDA compliant bag directly from the deposit. That said, the resulting bag is still designed to closely resemble the Dataverse bag.
Ingest areas¶
An ingest area is a directory on local disk storage that is used by the service to receive deposits. It contains the following subdirectories:
inbox
- the directory under which all input deposits must be locatedoutbox
- a directory where the processed deposit are moved to (if successful to a subdirectoryprocessed
, otherwise to one ofrejected
orfailed
)
The service currently supports only one ingest area: auto-ingest
- for continuous import of deposits offered through deposit service, such as
dd-sword2.
Processing of a deposit¶
Order of deposit processing¶
A deposit directory represents one dataset version. The version history of a datasets is represented by a sequence of deposit directories. When enqueuing
deposits the program will first order them by the timestamp in the Created
element in the contained bag's bag-info.txt
file.
Processing steps¶
The processing of a deposit consists of the following steps:
Basic scenario¶
- Check that the deposit is a valid deposit directory.
- Check that the bag in the deposit is a valid v1 DANS bag.
- Generate an NBN persistent identifier for the dataset and use that for the
dansNbn
field in the vault metadata. - Create a new, zipped RDA compliant bag from the deposit.
- Register the bag in the dd-vault-catalog with minimal metadata: bag ID, NBN, and swordToken.
- Move the deposit to the
outbox/processed
directory and change its state toRECEIVED
.
Update scenario¶
2a Part of the validation will be to check that the deposit is an update to an existing dataset by checking that the Is-Version-Of
field in the bag-info.txt
file of the deposit matches the swordToken
of a dataset in the vault catalog.
3a Instead of generating a new NBN, the Vault Catalog will be queried for the NBN of the dataset that is being updated.
ARGUMENTS¶
positional arguments:
{server,check} available commands
named arguments:
-h, --help show this help message and exit
-v, --version show the application version and exit
INSTALLATION AND CONFIGURATION¶
Currently, this project is built as an RPM package for RHEL7/CentOS7 and later. The RPM will install the binaries to
/opt/dans.knaw.nl/dd-vault-ingest-flow
and the configuration files to /etc/opt/dans.knaw.nl/dd-vault-ingest-flow
.
For installation on systems that do no support RPM and/or systemd:
- Build the tarball (see next section).
- Extract it to some location on your system, for example
/opt/dans.knaw.nl/dd-vault-ingest-flow
. - Start the service with the following command
/opt/dans.knaw.nl/dd-vault-ingest-flow/bin/dd-vault-ingest-flow server /opt/dans.knaw.nl/dd-vault-ingest-flow/cfg/config.yml
BUILDING FROM SOURCE¶
Prerequisites:
- Java 11 or higher
- Maven 3.3.3 or higher
- RPM
Steps:
git clone https://github.com/DANS-KNAW/dd-vault-ingest-flow.git
cd dd-vault-ingest-flow
mvn clean install
If the rpm
executable is found at /usr/local/bin/rpm
, the build profile that includes the RPM
packaging will be activated. If rpm
is available, but at a different path, then activate it by using
Maven's -P
switch: mvn -Pprm install
.
Alternatively, to build the tarball execute:
mvn clean install assembly:single