dd-virus-scan

Service for scanning Dataverse datasets for virus

Purpose

Service for scanning Dataverse datasets for virus.

Interfaces

This service has the following interfaces:

interfaces

Provided interfaces

Step invocation API

  • Protocol type: HTTP
  • Internal or external: internal
  • Purpose: invoked by Dataverse when the PrePublishDataset workflow is executed

Admin console

  • Protocol type: HTTP
  • Internal or external: internal
  • Purpose: application monitoring and management

Consumed interfaces

Dataverse API

  • Protocol type: HTTP
  • Internal or external: internal
  • Purpose: to retrieve file data to scan and to resume the workflow

Processing

The service uses the clamd daemon for the actual scanning. The data files of the dataset will be streamed through the clamd deamon one by one. If no virus is found the resume status will be "Success", otherwise "Failure" with in the message the instantiated resultPostiveMessageTemplate. See the comments in the config.yml for details.

The service has the following thread pools:

  • Workers for incoming HTTP requests from Dataverse. These will schedule scan tasks.
  • Scan task workers. These will loop over all the files in the targeted dataset and stream them through scand, gathering the results. The overall result will be determined a resumption of the workflow will be scheduled as a resume task.
  • Resume task workers. These will try to resume the workflow. This will be tried a configured number of times because Dataverse has known synchronization issues in its workflow framework.

References