dd-virus-scan¶
Service for scanning Dataverse datasets for virus
Purpose¶
Service for scanning Dataverse datasets for virus.
Interfaces¶
This service has the following interfaces:

Provided interfaces¶
Step invocation API¶
- Protocol type: HTTP
- Internal or external: internal
- Purpose: invoked by Dataverse when the
PrePublishDatasetworkflow is executed
Admin console¶
- Protocol type: HTTP
- Internal or external: internal
- Purpose: application monitoring and management
Consumed interfaces¶
Dataverse API¶
- Protocol type: HTTP
- Internal or external: internal
- Purpose: to retrieve file data to scan and to resume the workflow
Processing¶
The service uses the clamd daemon for the actual scanning. The data files of the dataset will be streamed through the clamd deamon one by one. If no virus
is found the resume status will be "Success", otherwise "Failure" with in the message the instantiated resultPostiveMessageTemplate. See the comments in
the config.yml for details.
The service has the following thread pools:
- Workers for incoming HTTP requests from Dataverse. These will schedule scan tasks.
- Scan task workers. These will loop over all the files in the targeted dataset and stream them through
scand, gathering the results. The overall result will be determined a resumption of the workflow will be scheduled as a resume task. - Resume task workers. These will try to resume the workflow. This will be tried a configured number of times because Dataverse has known synchronization issues in its workflow framework.