Preserving data

Preserving research data refers to the practice of keeping data available and usable in the longer term, beyond the end of your research project.

Why preserve data?

The main reasons for data preservation are:

  • Ensuring that your research can be verified and reproduced
  • Maintaining data for future reuse, e.g. for further research or teaching

Increasingly, funders, publishers, and institutions (including Ghent University) are requiring (certain) research data to be retained for a specified period and/or for a specific purpose.

Ghent University's RDM policy framework expects 'relevant' research data to be preserved for a minimum of 5 years. In the first place, this means data that are reasonably needed to verify and reproduce published scientific claims. Data with high reuse potential are also relevant to keep for the longer term.

Preserving vs. storing data

Preserving data from completed research is different from storing and backing up data files while your research is still ongoing. The latter typically involves data that are mutable; the former concerns data (or milestone versions of data) that are ‘frozen’ and not in active use.

Long-term preservation requires appropriate actions to prevent data from becoming unavailable and unusable over time, for example because of:  

  • Outdated software or hardware
  • Storage media degradation
  • A lack of sufficient descriptive and contextual information to keep data understandable

In other words, data preservation involves more than simply not deleting the data files created and stored in the course of your research project!

What to keep?

Not all (versions or parts of) research data can or have to be kept for the long term.

Maintaining data in a usable form for the longer term takes effort and has a considerable cost. Selecting which (parts of) data to keep, and for how long, is therefore an essential component of data preservation.

As a researcher you have a key role in deciding what to retain and what not, as you know your data best. Such decisions may depend on factors such as:

  • The type of data involved
  • The norms in your discipline
  • Whether you are keeping data for potential future reuse, for verification, or for other purposes. Depending on the purpose, you may need to keep the raw data or data in a more processed form (or perhaps you want to preserve different forms of the same dataset for different purposes, and for different retention periods).

Appraisal and selection of research data is still an evolving field, but some generic, high-level criteria are emerging to guide decisions on what to keep. Common criteria for keeping data include:

  • Legal or ethical requirements to keep (certain) data for a specified retention period (e.g. for clinical trials data)
  • Funder, institutional or publisher policies
  • High potential reuse value of the data
  • Great scientific, historical, or cultural significance of the data
  • The data are unique and/or cannot easily be re-created.  
  • The benefits outweigh the costs of data preservation.

The other side of the picture is that there can be valid reasons for disposing of (parts/versions of) data after finishing your research (e.g. duplicate copies, superseded versions, …) or later on, after expiration of the applicable retention period.

It is also important to document and justify your choices to keep or remove data, for example in your Data Management Plan.

Where to keep data?

Research data and documentation selected for retention should be kept in a suitable location and in a secure manner to ensure that they remain available and usable beyond the end of your project, with appropriate access rights.

Where appropriate, depositing data in an established, trustworthy data repository (sometimes also called a data center, data archive or scientific database) is generally the preferred option.

However, sometimes it may not be possible or not appropriate to deposit data in an external repository, e.g. for legal, ethical, contractual, practical, or other reasons. In such cases, research data selected for preservation will need to be kept in-house.

Non-digital research data and materials

RDM mostly focuses on digital research data. However, you may also collect analogue research data (e.g. surveys on paper…) as part of your research, or other non-digital materials that strictly speaking do not constitute research data (e.g. samples). Sometimes such non-digital data and materials also need to be retained after the end of your project.

  • Consider whether digitizing the data is an option (e.g. this may be worthwhile for data that will be kept permanently for future reuse).
  • If not, your Faculty, Department, research group, lab etc. may offer facilities to retain your data for verification or legal compliance purposes for a finite retention period. An example is the Faculty of Psychology and Educational Sciences’ Archive for Research Material.
  • Contact rdm.support@ugent.be in case you have paper research data that could merit permanent preservation for future reuse purposes. 

There are also repositories for non-digital materials you can make use of. Examples include:  

Preparing data for preservation

Keeping data findable, understandable and effectively reusable requires some preparation and effort on your part (i.e. keeping files organized, migrating files to sustainable formats, preparing a data package with data, documentation and metadata, and having the access rights and reuse permissions in place).

Many repositories have requirements or instructions to deposit data. Check them in advance so you can adequately prepare your data for deposit.

For further guidance on the use of data repositories, check our research tip.

Preserving research data after the end of your project is made significantly easier if you properly plan for data management from the outset, and implement good RDM practices during the active research phase.

More information