DAIS - Digital Archive of the Serbian Academy of Sciences and Arts: Workflows

From TRAP-RCUB

This public wiki is about the DAIS – Digital Archive of the Serbian Academy of Sciences and Arts

See also:

Registration

Registration is done by completing the registration form (please use an institutional email). Upon registration, a repository manager will assign appropriate permissions to eligible users, enabling them to deposit their work and access content that is not publicly available. Only Internal users and, in some cases, Associates may be granted the permission to deposit and access restricted content.

NB: By merely filling out the registration form users are not granted the right to deposit and access restricted content. External users should not submit registrations. In case they need information about the restricted content, they may use the feedback form.

Submission

Users can deposit new items by using a web-based submission form or by engaging directly with a repository manager (to perform the deposition on their behalf). Only registered users who are granted appropriate credentials can deposit data.

Depositors should meet a set of requirements during the submission step:

In order to help depositors in meeting the requirements, training and consultations are provided prior to data submission. This helps in ensuring data and metadata quality, resolving legal issues, and reducing costs linked to data ingest and curation. Keeping in mind that the interest in depositing research data has only recently emerged in the Designated Community and that Internal users produce various types of research data in various formats, it is often necessary to develop, assess, and test workflows. In such cases, additional support is provided to users: through a series of consultations, repository managers and users jointly define workflows and decide on the optimal formats, metadata, and access control.

In order to deposit content in the repository, one needs to log in and launch the submission procedure in accordance with user guidelines. The submission interface is divided into several steps. Each step has a set of mandatory fields. Depositors are not allowed to move to the next step unless all mandatory fields are filled in (see Metadata).

Contributors are free to provide any additional information they consider relevant in an additional description field (dc.description.other). ORCID(s) (if available) are added during curation by repository managers.

The same submission interface is used for all submissions and it currently efficiently meets user needs. If a need arises to include additional fields, the submission form can easily be adjusted by the repository development team.

Rights and licences

Depositors must have the necessary rights to submit a resource in the repository. During the submission process, depositors will be required to define access rights and assign a licence to the resource. DAIS supports Creative Commons licences for Open Access content.

Depositors must be willing and able to grant the Serbian Academy of Sciences and Arts the non-exclusive rights to both preserve and make their work available through the repository by accepting a non-exclusive Distribution Licence.

Reviewing submissions

Once an item is submitted, it undergoes a review by a repository manager to ensure that the metadata are correct and sufficient, that files meet relevant technical requirements, and that the access rights and licence are appropriate. Deposits are not publicly visible before approval by a repository manager. During this step, repository managers add and correct metadata, establish links between different versions (if applicable), and they may also contact the depositor to require additional information or file conversion to a preferred format (see Preferred file formats), if necessary.

In case of publications, repository managers check the quality of deposited files and they may also seek to replace low-quality files with high-quality ones, if possible (e.g. if a contributor submits a scanned document though a born-digital version is available). If all the requirements are met, the repository manager will approve the item (publish it in the repository).

In case the submission is inappropriate or is not in line with the Content policy, the repository manager may reject it. A submission may also be rejected if it fails to meet requirements in terms of metadata and data quality. Upon rejection, the depositor will receive an e-mail explaining the reasons for rejection and, if applicable, instructions how to correct and resubmit the item (See Submission under review).

Once an item is approved and published, a set of automated actions are launched: a PID (Handle) is assigned, readable text from data files (for PDFs) is extracted into a TXT file and included in the search index, and a thumbnail for the landing page is generated (for PDFs and image files) (see DSpace documentation).

Metadata import

New items may also be imported into the repository using the external service Ellena (MultiLoad module). MultiLoad supports metadata import via CrossRef and Dissem.in, as well as massive metadata import in the EndNote XML and RIS formats. Import must be approved by a repository manager and this action is performed in MultiLoad: each item is checked and metadata may be corrected and enriched before import. This feature is currently used only by repository managers, but may be enabled (with some limitations) for trusted users, if necessary.

DAIS does not use the native DSpace batch item importer. It is not disabled but its use is discouraged because Ellena offers better functinalities.

Curation

In DAIS, each community has at least one community manager, who organizes collections, manages users, validates deposits within the community, enriches metadata manually or relying on the external applications integrated with DAIS.

All deposits are subject to basic curation and most deposits are also subject to enhanced curation. Enhanced curation is set as the standard to be achieved, which means that the items that currently fail to meet high standards (due to poor metadata, low quality scans, no OCR performed, non-preferred formats, etc.) will be subject to additional curation at a later stage. Also, scanned text documents are gradually replaced with PDF/A compliant OCRed files. If necessary, curation may involve conversion to formats suitable for long-term preservation.

A set of customized external tools have been developed to enable enhanced curation:

  • Ellena - metadata normalization, metadata import (in the Endnote XML and RIS formats), massive corrections of metadata;
  • NomadLite uses text mining to retrieve funding information and APIs to find Web of Science and Scopus IDs; once checked and verified by repository managers, the retrieved information is automatically inserted into appropriate metadata fields;
  • ReportMaker discovers missing metadata by running predefined searches;

When necessary, automated maintenance procedures are set up to resolve some issues (e.g. file renaming to eliminate unsupported characters, thumbnail creation, etc.).

The following curation tasks are performed by repository managers on a regular basis:

  • normalization of authors' and contributors' names via Ellena by assigning ORCIDs (if available) or internal identifiers; the TRAP-RCUB development team has developed an alerting service that informs repository managers about newly registered ORCIDs for researchers from their institutions;
  • adding missing funding information retrieved by NomadLite;
  • adding Web of Science and Scopus identifiers retrieved by NomadLite.

Curation also involves the mapping of "shared" items (e.g. a book co-published by two participating institutions, or research outputs resulting from joint research conducted by multiple participating institutions) into all relevant collections, with the aim of increasing their discoverability.

Version control

Changes to deposited files by depositors or end users are not permitted. If necessary, an updated version may be deposited and the earlier version may be withdrawn from public view (see Withdrawing a published item). If multiple versions of the same content are available in the repository, there will be links between earlier and later versions and the most recent version will be clearly identified.

Correcting errors and updating the metadata

Once an item is approved and published in the repository, contributors do not have sufficient permissions to change the metadata and the content file(s). Only repository managers can do this. If an update or a correction are needed, contributors should contact repository managers at their institution or fill out the feedback form.

Any user may suggest a correction or an update using the feedback form.

Documentation

When publications are deposited in the repository, additionall documentation is normally not required. For other data types, depositors should provide additional documentation that may be necessary to understand, interpret and reuse data whenever data is not self-explaining.

Documentation should contain information about:

  • the context of data collection
  • data collection methods
  • structure and organization of data files
  • data quality and reliability
  • any changes to raw data and algorithms used to transform data (if applicable)
  • data confidentiality, access and use conditions
  • variable names and descriptions (if applicable)
  • file format and software used, as well as software required to open data files (in case of formats that are not widely used).

This information should be placed in a README.txt file (in the TXT format). The README file should also contain the main metadata and the persistent Handle assigned by the repository. In case the README file is incomplete or insufficiently detailed, the repository manager may edit it or require the depositor to provide additional information during the validation phase. README files may also be subject to enhanced curation.