Skip to Main Content

Research Data Management (RDM)

Resources to help researchers manage their research data, with an emphasis on Canadian tools.

Store Your Data

Your data are critical to your research.

Do you have a plan in place to store the information generated during your research project? 

Think about what would happen if you were to lose some or all of your data:

  • Could your project recover from such a setback?
  • How much time would you lose?
  • How much money would it cost you?
  • Would you be liable for the loss of time and/or data?

Proper storage of research data pays dividends throughout your project.
         (from Scott Summers' presentation on data security and storage, UK Data Service, 2016)

 


 

Research data storage can be broken down into 3 phases:

  1. Active research phase
    • Short-term storage during the research project. Data is being actively collected, refined, and analysed. Active storage must be secure, support mediated access for collaborators and research team members, and ideally support file versioning. A research project may require much more storage capacity in this phase to work with the data than will ultimately be required to store the final dataset.
  2. Evidence phase
    • Medium-term storage implemented at the end of the research project. Data is deposited or otherwise stored and documented in order to support project outputs, e.g. as evidence supporting a publication. Data should be accessible to those who require it (e.g. peer reviewers), but this does not mean it must be openly available. Nevertheless, if data is being shared or made open, that typically occurs in this phase. Storage method may be influenced by external requirements (e.g. from funders, publishers, agencies) around retention or destruction of data within a specified timeframe.
  3. Preservation phase
    • Long-term or permanent archival storage. May overlap with medium-term storage phase, depending on the storage solution chosen. Not all data should or needs to be archived; consider whether the data is of particular value to a discipline, or is subject to extended retention requirements (e.g. clinical trial data). Archived data may be openly accessible, restricted, or exist entirely offline (a "dark archive") and only be available on request.

Things to consider when weighing your storage options:

  • How much data will your project generate? This is something to consider during the planning phase, because storage costs should be factored into the overall data management plan.
  • Who will need access to the data during the project's active phase?  Collaborative research means additional challenges to storage and access.
  • Will the project involve confidential or sensitive information? If so, extra precautions are needed to avoid accidental disclosure.
Common Storage Options
Storage Medium Examples Pros Cons
Institutional cloud storage KPU OneDrive, Teams, or Sharepoint (R: drive)
  • Hosted by a cloud service provider (e.g. Microsoft, Amazon, Google) and managed by the researcher's school, institution, or department.
  • Data accessible from anywhere with an internet connection
  • Highly robust security, backup, and recovery practices
  • High capacity: 1 TB per user
  • Managing file sharing settings for collaboration with research partners may be complicated
  • Files unavailable offline unless configured to store local copies

Institutional network drive

(K:) drive
  • Hosted and managed by a researcher's school, institution, or department.
  • Data accessible whenever needed
  • Physically more secure
  • Backed up regularly, reducing potential for data loss
  • Potentially expandable capacity
  • May be complicated for external research partners to access
  • Limited baseline capacity (1 GB)
KPU-issued desktop or laptop computers  
  • More secure: login protected
  • Convenient for active phase storage and processing
  • Portable (laptops)
  • Moderate storage capacity (GBs-TBs)
  • Difficult to collaborate with external research partners
  • Susceptible to hardware failure, damage, loss, or theft
  • Manual backup needed
  • Storage capacity not upgradeable
Personal desktop or laptop computers  
  • Convenient for short term storage and processing
  • Portable (laptops)
  • Relatively inexpensive to upgrade/expand storage capacity (GBs-TBs)
  • Difficult to collaborate with external research partners
  • Susceptible to hardware failure, damage, loss, or theft
  • Manual backup needed
  • Easily lost, stolen or damaged
Commercial cloud
storage
Google Drive, AWS, Dropbox
  • Convenient; accessible anywhere with internet
  • Free or inexpensive
  • Easy collaboration
  • Not secure or suitable for sensitive data
  • Relatively low capacity at free tiers (2-15GB)
External storage media External hard drives, flash drives, and optical discs
  • Convenient
  • Inexpensive
  • Portable
  • Not secure or suitable for sensitive data
  • Very easily lost, stolen or damaged
  • Corruptible
  • Degrade quickly
  • Limited storage capacity

Note: In some cases, well-secured and encrypted local drives that are not connected to a network, and are backed up rigorously, are appropriate for storing very sensitive data. 

Protecting your research data "requires paying attention to physical security, network security and security of computer systems and files to prevent unauthorized access or unwanted changes to data, disclosure or destruction of data."[1] The more sensitive the data, the more stringent security measures need to be.

The Government of Canada is concerned about risks of theft and espionage related to federally-funded research data and has provided some guidance on its Safeguarding your Research website.

Here are a few aspects to consider:

Physical security
  • protects your data from disasters (e.g., flooding or fire)
  • prevents unauthorized access to computers or storage facilities where data and documents related to your project are kept
Encryption - encoding data in a way that makes it readable only to someone who has an access code, key, or password.
  • protects sensitive or confidential information
  • makes data transmission from one site to another more secure
  • restricts access only to authorized persons

The UK Data Service has detailed information about encryption techniques and tools, and has tutorial videos demonstrating the most commonly used encryption software.

Access Control - who needs access to your data and how do you manage that? 
  • restrict physical access to computer or storage media to members of the research team
  • employ password protection on all computers used during your research
  • encrypt files and provide access keys only to research team


Sources:

[1] Louise Corti, et al., 2014. Managing and Sharing Research Data: A Guide to Good Practice, Los Angeles: Sage.

Backing up your data 

Backing up data refers to making copies of files frequently, usually for short-term storage during a project's active phase; or for long-term storage during its static phase. Data files can be lost due to hardware or software failure, they can be accidentally altered or deleted, or they can become corrupted, rendering them unreadable or error-ridden. To avoid problems resulting from data loss, researchers should ensure that their data is properly backed up. A well thought out backup strategy should be an integral part of the overall data management plan. (adapted from UK Data Service)

Why back up your data?

  • reduce the risk of data loss, especially if that data cannot be reproduced
  • save time and money
  • recover research data with minimal disruption if something does go wrong
  • limit your liability

Things to consider when backing up your data. If you're using networked storage, discuss your requirements with the administrators. Some of the questions you need to ask are:

  • How frequently are their drives backed up?
  • How long do they store backed up copies? 
  • Do they perform complete or partial backups?
  • Do they validate the backups to ensure the integrity of the data?
  • How do they recover files in the event of a problem?

Incremental or partial back ups copy only the files or data that have changed since the previous back up and are performed on a regular basis.

Complete back-ups, on the other hand, duplicate your project's entire data collection.

Be sure to have multiple copies of backup and archive files, in several locations, in case of software or hardware failure, theft or tampering, or natural disasters.

Other considerations:

Protecting non-digital or textual data: ideally all non-digital data should be digitized. Items that cannot be digitized need to be managed in a way that keeps them secure and permits access on request.

File formats: use open or standardized formats rather than proprietary formats for both short-term and long-term storage of data.

Organization: establish and adhere to a protocol for naming and organizing back up copies to ensure that files are easy to locate and identify.

Tools: Mac OSX has a built-in back up utility called Time Machine; Windows has built-in back up and restore utilities; Linux users have access to a variety of backup and restore utilities.

There are also many third-party backup utilities available for all platforms, some open-source, some commercial.

 

Best Practices for Data Storage

1. As a part of your overall data management plan, design a detailed data storage, security and back up policy for your project, and review it from time to time during the project's active phase. 

2. Adhere to the 3-2-1 principle:

  • Keep 3 copies of research data
  • Use 2 different storage media
  • Store 1 copy off-site

3. Back up data files regularly. Check backed up files manually and verify them (using checksums, etc.) to ensure the integrity of the data.

4. Use portable media -- USB drives, portable hard drives, CDs or DVDs -- only for working copies of research data, not for master copies and never for sensitive or personal data. Encrypt these devices to protect the contents in the event of loss or theft.

5. Ensure data integrity by refreshing storage media. Magnetic and optical storage media can degrade with time. 

6. Employ open or standard file formats for data storage to ensure that files will be readable in the future.

7. Create meaningful file names (including version information) to aid in organizing and locating files and folders.

A good reference for all things RDM is Managing and Sharing Research Data: A Guide to Good Practice by Louise Corti, et al. (2nd ed., Sage, 2020).