DANS BagPack Profile v1.1.0¶
Introduction¶
Version¶
- Document version: 1.1.0
- Publication date: n/a
Status¶
The status of this document is DRAFT.
Changes¶
Changed from version 1.0.0 to 1.1.0¶
Change requirement 1.1. to also allow "holey bags" for support of external large objects. This change is backwards compatible because bags that were valid under the previous versions of this specification remain so.
Scope¶
This document specifies what constitutes an acceptable DANS BagPack. This includes all the requirements for a bag to be successfully processed by the DANS Data Vault ingest workflow.
Overview and Conventions¶
Keywords¶
The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.
The key word "SHOULD" is also used to specify requirements that are impossible or impractical to check by the archival organization (i.e., DANS). The client should do its best to meet these requirements but not rely on their being validated by the archival organization.
Subdivisions¶
The requirements are subdivided into the following sections:
- RDA BagPack Related - requirements that refer back to the RDA BagPack specifications. If a bag only needs to comply with the RDA BagPack specifications, then it should be sufficient to only check this section.
- Extra Requirements for DANS BagPack - requirements that are specific to the DANS BagPack Profile, and which are in addition to the RDA BagPack requirements.
The sections are numbered and may have numbered subsections. The requirements themselves are stated as numbered rules. Rules may have parts that are labeled with letters: (a), (b), (c), etc. To uniquely identify a specific rule, use the notation
<section-nr>[.<subsection-nr>].<rule-nr> [(<letter>)]
Example: 2.3.4 (e) means part e of the fourth rule in subsection 3 of section 2.
XML namespaces¶
When referring to XML element or attribute names or attribute values that have a prefix (such as schema:name) an element in a certain namespace is intended.
The table below lists the mapping from prefix to namespace. In the actual document, the namespace may be bound to a different prefix, or be the default
namespace.
| Prefix | Namespace URI | Namespace documentation |
|---|---|---|
schema |
http://schema.org/ |
schema.org |
dvcore |
https://dataverse.org/schema/core# |
Dataverse metadata elements |
vaultMd |
https://schemas.dans.knaw.nl/metadatablock/dansDataVaultMetadata# |
DANS Data Vault Metadata block |
Requirements¶
1. RDA BagPack Related¶
The following items are required by the RDA BagPack specifications:
- One of the following MUST hold:
- the DANS BagPack is a valid bag, according to BagIt v1.0 or BagIt v0.97
- the DANS BagPack is a holey bag (i.e., a bag with a fetch.txt file listing the missing files and their fetch URLs). The files to be fetched MUST be downloadable from the given URL or obtainable from a well-known location and have the checksums listed in the payload manifests. "Obtainable from a well-known location" means that the repository containing the bag documents how to map the fetch-URL or a checksum for the file to the location where the file data is stored.
- (a) A DANS BagPack MUST contain a file
metadata/datacite.xml(b) this file MUST be valid according to the DataCite schema version 4.0 or later, except for the requirement that there MUST be a DOI present: a DOI is not required for a DANS BagPack; (c) DataCite's recommended properties SHOULD be present. - Other files besides
datacite.xmlMAY be present in themetadatafolder.
2. Extra Requirements for DANS BagPack¶
The following items are required by the DANS BagPack Profile, in addition to the requirements of RDA BagPack:
- The
bag-info.txtfile SHOULD contain an elementBagIt-Profile-Identifierset to the identifier of the DANS BagPack BagIt Profile:https://doi.org/10.17026/e948-0r32. - (a) The bag MUST conform to the DANS BagPack BagIt Profile (even if the
BagIt-Profile-Identifierelement pointing to it is missing). (b) The bag SHOULD conform to any other BagIt profiles declared in theBagIt-Profile-Identifierelement. - There MUST be a file called
metadata/pid-mapping.txt: the structure of this file MUST be rows formatted as<identifier> <referenced object>, where<identifier>is a unique URI and<referenced object>is the path to the file relative to the root of the bag, and both are separated by one or more spaces. One of the lines MAY be mapping from the dataset DOI to a folder directly under thedatafolder. - (a) There MUST a
metadata/oai-ore.jsonfile which MUST be a valid JSON-LD 1.0 or higher document; (b) The object described in the document MUST have the attributevaultMd:dansBagIdwhose value is a URN:UUID. (c) Theore:AggregatedResources of theore:AggregationMUST have the following attributes: (i)@idwhose value is a URI; (ii)schema:name; (iii)dvcore:restricted, with value true or false. - There MUST be a one-to-one mapping between the files in the
datafolder and the files described in the Aggregation contained inoai-ore.jsonldfile: (a) all identifiers found in 2.4(c)(i) MUST be present in the left column ofpid-mapping.txt; (b) the set of paths pointing to files found in the right column ofpid-mapping.txtMUST be equal to the set of paths of files present in thedatafolder (relative to the bag root).