DAIS - Digital Archive of the Serbian Academy of Sciences and Arts: Preservation plan: Difference between revisions

From TRAP-RCUB

Line 12: Line 12:


== Preservation policy ==
== Preservation policy ==
SASA, SASA institutes and RCUB are committed to the long-term care of items deposited in its repository and strive to adopt the current best practices in digital preservation. They aim at preserving the repository content for re-use, while retaining authenticity and ensuring readability of data files. Efforts are also made to mitigate the risk of deterioration, damage, data loss and corruption, as well as the obsolescence of file formats, storage or dissemination means.
SASA, SASA institutes and RCUB are committed to the long-term preservation of items deposited in its repository and strive to adopt the current best practices in digital preservation. They aim at preserving the repository content for re-use, while retaining authenticity and ensuring readability of data files. Efforts are also made to mitigate the risk of deterioration, damage, data loss and corruption, as well as the obsolescence of file formats, storage or dissemination means.


In order to ensure this, depositors should meet a set of requirements during the submission step:
=== Roles and responsibilities ===
 
* Submitted data must fit into the scope of the [https://repowiki.rcub.bg.ac.rs/index.php/DAIS_-_Digital_Archive_of_the_Serbian_Academy_of_Sciences_and_Arts:_General_information#Content_policy_and_organization Content policy];
* Data formats should be suitable for long-term preservation (see [https://repowiki.rcub.bg.ac.rs/index.php/DAIS_-_Digital_Archive_of_the_Serbian_Academy_of_Sciences_and_Arts:_General_information#Preferred_file_formats Preferred file formats]);
* Sufficient metadata shall be provided (see [[DAIS - Digital Archive of the Serbian Academy of Sciences and Arts: Metadata|Metadata]]);
* Legal issues are addressed (see [https://repowiki.rcub.bg.ac.rs/index.php/DAIS_-_Digital_Archive_of_the_Serbian_Academy_of_Sciences_and_Arts:_General_information#Rights Rights] and [[DAIS - Digital Archive of the Serbian Academy of Sciences and Arts: Distribution Licence|Distribution License]]).
 
In order to help depositors in meeting the requirements, training and consultations are provided prior to data submission. This helps in ensuring data and metadata quality, resolving legal issues, and reducing costs linked to data ingest and curation.
 
=== Roles and responisbilities ===
Access to repository administration functions is strictly limited to authorized staff. All staff involved with repository maintenance and daily operations have well defined roles and are  familiar with relevant policies and their roles in assisting and in implementing the preservation policy.
Access to repository administration functions is strictly limited to authorized staff. All staff involved with repository maintenance and daily operations have well defined roles and are  familiar with relevant policies and their roles in assisting and in implementing the preservation policy.


Line 49: Line 40:
Since the designated community is multidisciplinary, repository managers work in close cooperation with the depositors, as described under [https://repowiki.rcub.bg.ac.rs/index.php/DAIS_-_Digital_Archive_of_the_Serbian_Academy_of_Sciences_and_Arts:_Workflows#Reviewing_submissions Reviewing submissions]. Repository managers perform metadata quality checks and enhance metadata as described under [https://repowiki.rcub.bg.ac.rs/index.php/DAIS_-_Digital_Archive_of_the_Serbian_Academy_of_Sciences_and_Arts:_Workflows#Curation Curation].   
Since the designated community is multidisciplinary, repository managers work in close cooperation with the depositors, as described under [https://repowiki.rcub.bg.ac.rs/index.php/DAIS_-_Digital_Archive_of_the_Serbian_Academy_of_Sciences_and_Arts:_Workflows#Reviewing_submissions Reviewing submissions]. Repository managers perform metadata quality checks and enhance metadata as described under [https://repowiki.rcub.bg.ac.rs/index.php/DAIS_-_Digital_Archive_of_the_Serbian_Academy_of_Sciences_and_Arts:_Workflows#Curation Curation].   


Curently, the quality of metadata in DAIS varies. The publications covered by the Open Access mandate (i.e. those resulting from the research funded by the Ministry for Education, Science and Technological Development, since 2018) and recent publications by SASA and its institutes are described using rich metadata (funding information, abstracts, keywords, various identifiers, relations to ther versions, publications and data). In case of earlier publications, metadata records may contain only the minimum information necessary to identify the publication (mandatory metadata). This is especially the case with scanned publications where OCR has not been performed and where manual transcription would be required to provide more detailed descriptions. These metadata records will be subject to subsequent enhancement.   
Currently, the quality of metadata in DAIS varies. The publications covered by the Open Access mandate (i.e. those resulting from the research funded by the Ministry for Education, Science and Technological Development, since 2018) and recent publications by SASA and its institutes are described using rich metadata (funding information, abstracts, keywords, various identifiers, relations to ther versions, publications and data). In case of earlier publications, metadata records may contain only the minimum information necessary to identify the publication (mandatory metadata). This is especially the case with scanned publications where OCR has not been performed and where manual transcription would be required to provide more detailed descriptions. These metadata records will be subject to subsequent enhancement.   


=== Digital continuity ===
=== Digital continuity ===
Considerable effort is invested in ensuring digital continuity of archived data, i.e. data usability over time. The needs of the desgnated community, the development of technology and organizational changes are monitored in order to be able to plan and apply required actions in a timely manner.  
Considerable effort is invested in ensuring digital continuity of archived data, i.e. data usability over time. The needs of the designated community, the development of technology and organizational changes are monitored in order to be able to plan and apply required actions in a timely manner.
 
DAIS has the right to copy, transform, store and provide access to the data. This right is granted by depositors upon submission. The repository has the right to convert file formats if this is necessary to ensure permanent access to a resource.  
DAIS has the right to copy, transform, store and provide access to the data. This right is granted by depositors upon submission. The repository has the right to convert file formats if this is necessary to ensure permanent access to a resource.  


During the submission and Reviewing steps, efforts are made to accept only file formats suitable for long-term preservation. However, this is not always possible and in some cases, a compromise is made in the best interest of the designated community. Priority is given to data collection and ingestion into the repository, in order to mitigate the risk of data loss and to make the content available to the designated community as soon as possible.  A number of PDF files obtained from publishers or by scanning do not conform to the PDF/A standard but they are still accepted because the conversion to a preferred format would delay ingestion into the repository (increasing the risk of data loss) and access to the content. These files will be converted to PDF/A by the repository development team to ensure an optimal format normalization.  
During the Submission and Reviewing steps, efforts are made to accept only file formats suitable for long-term preservation. However, this is not always possible and in some cases, a compromise is made in the best interest of the designated community. Priority is given to data collection and ingestion into the repository, in order to mitigate the risk of data loss and to make the content available to the designated community as soon as possible.  A number of PDF files obtained from publishers or by scanning do not conform to the PDF/A standard but they are still accepted because the conversion to a preferred format would delay ingestion into the repository (increasing the risk of data loss) and access to the content. These files will be converted to PDF/A by the repository development team to ensure an optimal format normalization.  


Poorly documented proprietary file formats may be accepted only exceptionally. This is done when it is not possible to convert files to a preferred format without compromising data integrity, or in cases when it is necessary to capture and archive data that have already been published elsewhere (e.g. in a journal as supplementary information, the .opj format). Efforts are made to limit the number of such cases as much as possible. Collaboration with the original data creators is necessary to convert these files to preferred formats.  
Poorly documented proprietary file formats may be accepted only exceptionally. This is done when it is not possible to convert files to a preferred format without compromising data integrity, or in cases when it is necessary to capture and archive data that have already been published elsewhere (e.g. in a journal as supplementary information, the .opj format). Efforts are made to limit the number of such cases as much as possible. Collaboration with the original data creators is necessary to convert these files to preferred formats.  

Revision as of 11:58, 28 September 2021

This public wiki is about the DAIS – Digital Archive of the Serbian Academy of Sciences and Arts

See also:

Preservation policy

SASA, SASA institutes and RCUB are committed to the long-term preservation of items deposited in its repository and strive to adopt the current best practices in digital preservation. They aim at preserving the repository content for re-use, while retaining authenticity and ensuring readability of data files. Efforts are also made to mitigate the risk of deterioration, damage, data loss and corruption, as well as the obsolescence of file formats, storage or dissemination means.

Roles and responsibilities

Access to repository administration functions is strictly limited to authorized staff. All staff involved with repository maintenance and daily operations have well defined roles and are familiar with relevant policies and their roles in assisting and in implementing the preservation policy.

The participating institutions appoint a number of repository managers, who are contact point for depositors and the team responsible for software development and technical support (at UoB-RCUB; see Organization scheme). The work of repository managers is funded by the participating institutions (through regular salaries).

UoB-RCUB is responsible for hosting, regular back-up, software upgrades and development, additional features, user support and training, and the implementation of interoperability standards. UoB-RCUB has appointed a dedicated team (TRAP-RCUB) responsible for repository development. The team also serves as a steering body.

Retention

Metadata and files deposited in the repository are stored permanently. Content may be removed only in exceptional circumstances. Records may be withdrawn from the repository in case of:

  • Proven copyright violation;
  • Plagiarism;
  • Falsified research;
  • Research containing major errors;
  • Threat to national security.

Withdrawn items are not deleted per se, but are removed from public view. The metadata of withdrawn items will not be searchable. Withdrawn items' Handles and URLs are retained indefinitely.

Data integrity and authenticity

Only registered users can deposit items and the status of registered users is granted only to internal users. Accordingly, SASA and SASA Institutes are responsible for verifying the user identity. Provenance information is saved for each item. Once the item is approved, only repository managers are able to change the metadata and bitstreams. Submissions are reviewed by qualified staff to ensure the metadata quality and completeness, the compliance of data formats, best practice and preservation requirements, data integrity and quality, and resolve potential legal issues. Changes to submitted and approved items (metadata and bitstreams) by end users are not supported. If necessary, users may deposit a new version. Each version is assigned a unique and persistent identifier (Handle). Relations are established in the metadata between various versions.

DSpace ensures the integrity of both data and metadata over time regardless of possible changes in the physical storage media. To verify that a digital object has not been altered or corrupted, the repository periodically checks the integrity of the data. The checks include the verification of md5 checksums and metadata integrity, and testing that URLs are working.

Independent understandability of data

Data is described at the individual resource level using metadata. The metadata schema is generic and sufficiently flexible to preserve various resources from a wide range of research disciplines. It is also possible to establish relations in the metadata to other publications and related data. Metadata properties can be mandatory, recommended or optional (see Metadata).

Since the designated community is multidisciplinary, repository managers work in close cooperation with the depositors, as described under Reviewing submissions. Repository managers perform metadata quality checks and enhance metadata as described under Curation.

Currently, the quality of metadata in DAIS varies. The publications covered by the Open Access mandate (i.e. those resulting from the research funded by the Ministry for Education, Science and Technological Development, since 2018) and recent publications by SASA and its institutes are described using rich metadata (funding information, abstracts, keywords, various identifiers, relations to ther versions, publications and data). In case of earlier publications, metadata records may contain only the minimum information necessary to identify the publication (mandatory metadata). This is especially the case with scanned publications where OCR has not been performed and where manual transcription would be required to provide more detailed descriptions. These metadata records will be subject to subsequent enhancement.

Digital continuity

Considerable effort is invested in ensuring digital continuity of archived data, i.e. data usability over time. The needs of the designated community, the development of technology and organizational changes are monitored in order to be able to plan and apply required actions in a timely manner.

DAIS has the right to copy, transform, store and provide access to the data. This right is granted by depositors upon submission. The repository has the right to convert file formats if this is necessary to ensure permanent access to a resource.

During the Submission and Reviewing steps, efforts are made to accept only file formats suitable for long-term preservation. However, this is not always possible and in some cases, a compromise is made in the best interest of the designated community. Priority is given to data collection and ingestion into the repository, in order to mitigate the risk of data loss and to make the content available to the designated community as soon as possible. A number of PDF files obtained from publishers or by scanning do not conform to the PDF/A standard but they are still accepted because the conversion to a preferred format would delay ingestion into the repository (increasing the risk of data loss) and access to the content. These files will be converted to PDF/A by the repository development team to ensure an optimal format normalization.

Poorly documented proprietary file formats may be accepted only exceptionally. This is done when it is not possible to convert files to a preferred format without compromising data integrity, or in cases when it is necessary to capture and archive data that have already been published elsewhere (e.g. in a journal as supplementary information, the .opj format). Efforts are made to limit the number of such cases as much as possible. Collaboration with the original data creators is necessary to convert these files to preferred formats.

Operational continuity and disaster recovery

DAIS is hosted by the University of Belgrade Computer Centre on a virtual machine in a Proxmox environment under a CentOS operating system. Hardware resources are incrementally adjusted to the database size and the number of visitors. The repository database is stored on a PostgreSQL 9.5 server inside the production-level virtual machine. Database export is enabled.

The software platform of DAIS is based on DSpace 5.10. The core DSpace code and Java code have not been modified to facilitate the implementation of DSpace upgrades. Major modifications have been made to the configuration, localization files and the XMLUI configuration. The system has been enriched with additional applications (displaying citation counts from the Web of Science, Scopus, Dimensions and Altmetric Attention Scores; displaying recommended citation; full ORCID integration; displaying human-readable funding information in the selected interface language). The source code of the customized version of DSpace and all additional applications is stored on a local Git server accessible only to the repository development team. Detailed documentation about software, installation, configuration, maintenance, and troubleshooting is available on Confluence. This enables easy replication of procedures and ensures continuity in case of staff changes.

Backups are regularly performed at the virtual machine level. Both live instances and their passive backups reside on hardware-enabled and redundant RAID setups. The monitoring and alerting service MONIT, maintained by the RCUB team, constantly monitors the operation of the repository and sends alerts to system administrators in case of unexpected events. Local firewall appliances, such as Iptables and Fail2ban, are used to protect and restrict access to the DAIS instance. The repository follows a regular upgrade cycle and, where possible, existing and widely accepted best practices.

In case of major software configuration changes or updates, the virtual machine is cloned and all changes are tested on the clone. Before any intervention on the production machine, a snapshot is created in the virtualization system, to enable roll-back and prevent data loss. End-users are duly informed about planned changes and upgrades.

Continuity of Access

According to the law, SASA is the national academy and the most prominent scholarly institution in Serbia. The institutes are independent legal entities but their work is closely tied with the mission and the activities of SASA (e.g. joint projects, co-publishing projects, joint conferences, etc.). In line with their mission and the role of publicly funded institutions, SASA, SASA institutes, and RCUB (as the Outsource Partner) seek to provide reliable and secure archiving for diverse outputs of SASA and SASA institutes, while ensuring an easy access and widest dissemination of the Open Access content. The current level of funding is sufficient to maintain and develop DAIS. Development and maintenance, as well as data security, are ensured through a SLA with RCUB.

SASA and SASA institutes are able to preserve data access in case of unexpected emergency budget cuts. DAIS is easy to keep running and service costs are not high. All repository managers are employed under regular contracts at participating institutions and their activities related to repository management do not incur any additional cost. The SLA with RCUB foresees Post-Cancellation Service Time, i.e. a period of time after the termination of SLA during which the repository will be available with the minimum maintenance services provided. Accordingly, even in case of funding disruption, the services will be kept running, providing sufficient time to find a sustainable solution.

Hardware security

The computer hardware that runs the repository is the property of RCUB. A dedicated team at RCUB takes care of the configuration, maintenance, security, software updates and development. RCUB has a dedicated team responsible for infrastructure security. RCUB security officers are responsible for general network security, server security, and service maintenance and they collaborate closely with the repository development team. Servers and network devices are kept in a dedicated area with physical access strictly limited to authorized staff. Access to the backup facilities is strictly limited access. The premises are equipped with fire alarms and a fire retardant system. Uninterrupted power supply is ensured by means of an automatic stand-by electric power generator. Dedicated staff members are physically present on the premises 24/7. Remote security services are also provided.