DAIS - Digital Archive of the Serbian Academy of Sciences and Arts: Workflows

From TRAP-RCUB

Revision as of 20:26, 15 June 2022 by Trap (talk | contribs) (→‎Change management)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

This public wiki is about the DAIS – Digital Archive of the Serbian Academy of Sciences and Arts

See also:

Three main workflows in DAIS are:

  • registration, where members of the Designated Community are assigned credentials and granted appropriate permissions;
  • submission, where membrrs of the Designated Community submit data and the accompanying metadata to the repository, and repository managers validate or reject submissions; and
  • curation, where repository managers perform manual or semi-automated actions to enhance metadata and ensure the long-term preservation of data.

Registration

Registration is done by completing the registration form (please use an institutional email). Upon registration, a repository manager will assign appropriate permissions to eligible users, enabling them to deposit their work and access content that is not publicly available. In order to assign permissions to users, the repository manager has to define appropriate user groups and their privileges, and then assign users to the groups. Only Internal users and, in some cases, Associates may be granted the permission to deposit and access restricted content.

NB: By merely filling out the registration form users are not granted the right to deposit and access restricted content. External users should not submit registrations. In case they need information about the restricted content, they may use the feedback form.

Submission

Users can deposit new items by using a web-based submission form or by engaging directly with a repository manager (to perform the deposition on their behalf). Only registered users who are granted appropriate credentials can deposit data.

Depositors should meet a set of requirements during the submission step:

In order to help depositors in meeting the requirements, training and consultations are provided prior to data submission. This helps in ensuring data and metadata quality, resolving legal issues, and reducing costs linked to data ingest and curation. Keeping in mind that the interest in depositing research data has only recently emerged in the Designated Community and that Internal users produce various types of research data in various formats, it is often necessary to develop, assess, and test workflows. In such cases, additional support is provided to users: through a series of consultations, repository managers and users jointly define workflows and decide on the optimal formats, metadata, and access control.

In order to deposit content in the repository, one needs to log in and launch the submission procedure in accordance with user guidelines. The submission interface is divided into several steps. Each step has a set of mandatory fields. Depositors are not allowed to move to the next step unless all mandatory fields are filled in (see Metadata).

Contributors are free to provide any additional information they consider relevant in an additional description field (dc.description.other). ORCID(s) (if available) are added during curation by repository managers.

The same submission interface is used for all submissions and it currently efficiently meets user needs. If a need arises to include additional fields, the submission form can easily be adjusted by the repository development team.

Rights and licences

Depositors must have the necessary rights to submit a resource in the repository. During the submission process, depositors will be required to define access rights and assign a licence to the resource. DAIS supports Creative Commons licences for Open Access content.

Depositors must be willing and able to grant the Serbian Academy of Sciences and Arts the non-exclusive rights to both preserve and make their work available through the repository by accepting a non-exclusive Distribution Licence.

Reviewing submissions

Once an item is submitted, it undergoes a review by a repository manager to ensure that the metadata are correct and sufficient, that files meet relevant technical requirements, and that the access rights and licence are appropriate. Deposits are not publicly visible before approval by a repository manager. During this step, repository managers add and correct metadata, establish links between different versions (if applicable), and they may also contact the depositor to require additional information or file conversion to a preferred format (see Preferred file formats), if necessary. File conversion may also be performed by the repository manager, but this will be done in coordination with the depositor and the institution's management to avoid complications that may arise from isufficiently defined institutional data policies.

In case of publications, repository managers check the quality of deposited files and they may also seek to replace low-quality files with high-quality ones, if possible (e.g. if a contributor submits a scanned document though a born-digital version is available). If all the requirements are met, the repository manager will approve the item (publish it in the repository).

In case the submission is inappropriate or is not in line with the Content policy, the repository manager may reject it. A submission may also be rejected if it fails to meet requirements in terms of metadata and data quality. Upon rejection, the depositor will receive an e-mail explaining the reasons for rejection and, if applicable, instructions how to correct and resubmit the item (See Submission under review).

Once an item is approved and published, a set of automated actions are launched: a PID (Handle) is assigned, readable text from data files (for PDFs) is extracted into a TXT file and included in the search index, and a thumbnail for the landing page is generated (for PDFs and image files) (see DSpace documentation).

Handling sensitive data

DAIS currently does not contain sensitive data. However, a procedure for handling potentially sensitive data is defined and the software platform and repository managers can supports sophisticated access control and prevent sensitive data from becoming exposed or compromised.

Potentially sensitive data produced by the participating institutions may include:

  • audio and video recodings made during field research (ethnology and linguistics);
  • survey data (geography, ethnology, linguistics);
  • interview transcripts (ethnology and linguistics).

The following procedure will apply in case a user deposits data that contain sensitive information:

  • All Submission Information Packages (SIPs) are subject to validation by the repository manager. During the validation phase, SIPs are accessible only to repository managers. The repository manager will check both the metadata and the submitted file(s). In case sensitive information is observed, the SIP will be returned to the Producer (depositor) for revision and the responsible repository manager will contact the Producer, the project manager (if applicable) and the institution's management.
  • Once returned to the Producer, the SIP will be unlocked for editing, so that the files containing sensitive data can be removed from the SIP and replaced with appropriate files.
  • The repository manager will offer advice on handling the data. If necessary, expert advice will be sought to define the most appropriate data protection methods. Even in cases when the repository manager has sufficent skills to perform data protection measures independently, this will be done in collaboration with the Producer and the responsible institution(s) to avoid any complication that may arise from isufficiently defined institutional policies.
  • The Producer will be required to perfom relevant actions towards protecting sensitive data (e.g. in case of audio and video recordings, an explicit informed consent would be required to make the data available in the repository; survey data and transcripts would be anonymized; expert support would be required if encription is needed, etc.).
  • The cost of data protection actions will be borne by the participating institution and the process will be supervised by the project manager and the participating institution's management.
  • Once the defined data protection measures are applied, the Producer will add the appropriate data files to the SIP, update the metadata (if required) and resubmit the SIP, which will be subject to validation.
  • In case of failure to comply with data protection measures, the SIP containing sensitive data will be entirely removed.

Metadata import

New items may also be imported into the repository using the external service Ellena (MultiLoad module). MultiLoad supports metadata import via CrossRef and Dissem.in, as well as massive metadata import in the EndNote XML and RIS formats. Import must be approved by a repository manager and this action is performed in MultiLoad: each item is checked and metadata may be corrected and enriched before import. This feature is currently used only by repository managers, but may be enabled (with some limitations) for trusted users, if necessary.

DAIS does not use the native DSpace batch item importer. It is not disabled but its use is discouraged because Ellena offers better functionalities.

Curation

In DAIS, each community has at least one community manager, who organizes collections, manages users, validates deposits within the community, enriches metadata manually or relying on the external applications integrated with DAIS.

All deposits are subject to basic curation and most deposits are also subject to enhanced curation. Enhanced curation is set as the standard to be achieved, which means that the items that currently fail to meet high standards (due to poor metadata, low quality scans, no OCR performed, non-preferred formats, etc.) will be subject to additional curation at a later stage. Also, scanned text documents are gradually replaced with PDF/A compliant OCRed files. If necessary, curation may involve conversion to formats suitable for long-term preservation.

After a submission is approved, a set of automated actions are launched during which readable text from data files (for PDFs) is extracted into a TXT file and included in the search index, and a thumbnail for the landing page is generated (for PDFs and image files). These actions do not cause any changes to the deposited data files. The repository currently does not perform OCR, nor does it generate front pages or insert machine readable metadata in data files. Repository managers may perform OCR, if participating institutions provide resources for this. In case of publications, repository managers would normally add a custom front page containing information necessary to identify the publication if this information is not contained in the submitted file. In most cases, this is done manually when reviewing the submission. Repository managers may also edit incomplete README files.

As for other data types, if additional data curation is required, repository managers will coordinate their actions with depositors and the institution's management and will seek expert support even if they have sufficient skills to perform curation actions. The reason for this are insufficiently developed institutional policies relating to research data.

A set of customized external tools have been developed to enable enhanced metadata curation:

  • Ellena - metadata normalization, metadata import (in the Endnote XML and RIS formats), massive corrections of metadata;
  • NomadLite uses text mining to retrieve funding information and APIs to find Web of Science and Scopus IDs; once checked and verified by repository managers, the retrieved information is automatically inserted into appropriate metadata fields;
  • ReportMaker discovers missing metadata by running predefined searches;

When necessary, automated maintenance procedures are set up to resolve some issues (e.g. file renaming to eliminate unsupported characters, thumbnail creation, etc.).

The following curation tasks are performed by repository managers on a regular basis:

  • normalization of authors' and contributors' names via Ellena by assigning ORCIDs (if available) or internal identifiers; the TRAP-RCUB development team has developed an alerting service that informs repository managers about newly registered ORCIDs for researchers from their institutions;
  • adding missing funding information retrieved by NomadLite;
  • adding Web of Science and Scopus identifiers retrieved by NomadLite.

Curation also involves the mapping of "shared" items (e.g. a book co-published by two participating institutions, or research outputs resulting from joint research conducted by multiple participating institutions) into all relevant collections, with the aim of increasing their discoverability.

Version control

Changes to deposited files by depositors or end users are not permitted. If necessary, an updated version may be deposited and the earlier version may be withdrawn from public view (see Withdrawing a published item). If multiple versions of the same content are available in the repository, there will be links between earlier and later versions and the most recent version will be clearly identified.

Correcting errors and updating the metadata

Once an item is approved and published in the repository, contributors do not have sufficient permissions to change the metadata and the content file(s). Only repository managers can do this. If an update or a correction are needed, contributors should contact repository managers at their institution or fill out the feedback form.

Any user may suggest a correction or an update using the feedback form.

Documentation

When publications are deposited in the repository, additionall documentation is normally not required. For other data types, depositors should provide additional documentation that may be necessary to understand, interpret and reuse data whenever data is not self-explaining.

Documentation should contain information about:

  • the context of data collection
  • data collection methods
  • structure and organization of data files
  • data quality and reliability
  • any changes to raw data and algorithms used to transform data (if applicable)
  • data confidentiality, access and use conditions
  • variable names and descriptions (if applicable)
  • file format and software used, as well as software required to open data files (in case of formats that are not widely used).

This information should be placed in a README.txt file (in the TXT format). The README file should also contain the main metadata and the persistent Handle assigned by the repository. In case the README file is incomplete or insufficiently detailed, the repository manager may edit it or require the depositor to provide additional information when reviewing the submission. README files may also be subject to enhanced curation.

Change management

Workflows are subject to change based on inputs from repository managers and a feasibility assessment by the development team. Changes are usually undertaken with the aim of addressing potential security issues, complying with legal requirements, best practice or technical requirements set by aggregators, and in response to user needs. The following procedure is used to manage workflow changes:

  1. All stakeholders (repository managers, Designated Community members, including the management of the participating institutions, TRAP-RCUB members) are encouraged to provide feedback, suggest improvements, or indicate flaws. Designated Community members may make suggestions through local repository managers, through the DAIS Administrator & RCUB user support coordinator, or through the feedback form accessible from the footer on each page in the repository. General users may make suggestions via the feedback from.
  2. All suggestions are pooled together by the DAIS Administrator & RCUB user support coordinator, who identifies meaningful suggestions and records them as "stories" in the backlog of the Jira agile project management tool used by the TRAP-RCUB team. Jira "stories" containing user suggestions are placed in apporopriate Jira projects, supplied with a description and enriched with tags (keywords). If possible, "stories" are linked to related project elements in Jira and additional information (examples, technical documentation, etc.) is provided (if available).
  3. Suggestions are discussed by the TRAP-RCUB team during regular meetings. The following criteria are taken into account when assessing suggestions: feasibility, paying special attention to security, the integrity of the platform, and the time (hours) required to develop a feature; the number of users benefiting from the action, and the global developments in the area (e.g. standards or expected technical improvements). If required, meeting with the Designated Community are organized. In case a major software, hardware, or workflow change is required, the management of the participating institutions and the management of RCUB are involved.
  4. Once a user suggestion is selected for implementation, its status in Jira will be changed to „selected for development", whereas the issue type will be changed to „task“. The priority level will be defined and it will be assigned to the team members responsible for its implementation. Suggestions that are not accepted are not removed from the Jira backlog. TRAP-RCUB team members add comments explaining the reason for rejecting a suggestion or postponing its implementation. This is helpful in dealing with recurring suggestions that are not feasible or prioritized.
  5. Before making changes to software or hardware, measures ensuring possible restoration of the system are taken. The virtual machine is cloned and all changes are tested on the clone. Before any intervention on the production machine, a snapshot is created in the virtualization system, to enable roll-back and prevent data loss.
  6. The system documentation in Confluence, the code on the local Git server, and the manuals for repository managers and end-users are updated to describe the new situation and indicate what has been changed. All relevant Jira "tasks" will ne marked as done and additional information relating to the implementation of change will be provided. A brief summary of changes will be included in the manual for repository managers (at the beginning).
  7. The DAIS Administrator & RCUB user support coordinator will duly inform repository managers about the changes. In case of workflow changes, training will be organized for repository managers and end-users.