Share Your Data

What is a data repository?

an archival service providing reliable long-term care for digital objects with research value
preserves, manages, and provides access to many types of digital materials in a variety of formats
may offer curation for data and metadata to enable search, discovery, and reuse
ideally, provides sufficient control for the digital material to remain authentic, reliable, accessible and usable on a continuing basis

from CODATA Research Data Management Terminology

Why deposit your research data into a data repository?

Repositories can assist with

meeting publisher data policy requirements
managing data
supplying a persistent identifier in order for you or others to cite your data
facilitating discovery of your data
preserving your data long-term

Choosing a Data Repository

**NEW** Repository Options in Canada: A Portage Guide Find it in English or French

Currently recommended best practice is to submit data to a discipline-specific, community-recognized repository where possible. If a suitable discipline-specific resource does not exist, submitting to a generalist repository is the next best option.

Does my data have to be stored in Canada?

If your data does not contain any personal information, there are no restrictions on where it can be stored.

For data that does contain sensitive personal information, on November 25, 2021, FIPPA law in B.C. was overhauled to now permit public bodies to disclose personal information outside of Canada, provided that disclosure complies with the Personal Information Disclosure for Storage Outside of Canada Regulation. This requires public bodies to undertake a Privacy Impact Assessment for projects where sensitive personal information is to be stored outside of Canada.

However, "sensitive personal information" is not defined in the regulation. It is up to the public body (in this case, the researcher/s, in collaboration with the KPU Privacy Office and Research Ethics Board) to use best judgment in determining the sensitivity of the data in question and thus whether a PIA is required. Researchers should carefully consider the potential risks of transferring and housing sensitive data outside Canada, such as the compatibility of the host nation's privacy laws with our own.

A useful overview of the FIPPA amendment can be found here.

What kinds of data repositories are available?

Generalist - Generalist repositories accept data from any discipline. A Generalist Repository Comparison Chart of popular generalist repositories is published by Fairsharing.org.

Disciplinary - Disciplinary or domain-specific repositories accept data only from a specific field of research. To find a domain-specific repository suitable for your data, check out Fairsharing.org or Re3data.org, a searchable "global registry of research data repositories from a diverse range of academic disciplines. It provides information on repositories for the permanent storage and access of data sets to researchers, funding bodies, publishers and scholarly institutions."

Some publishers provide lists of recommended repositories for depositing publication-related data: PLOS, SpringerNature, Scientific Data

The government of Canada provides some examples of (primarily biomedical) research outputs and corresponding appropriate, publicly accessible repositories here.

Note that some repositories allow for self-deposit while some are based on submission to a larger body, and that the levels of curation vary from repository to repository. Below is a comparison of some well-known repositories:

Generalist Repositories

FRDR - Federated Research Data Repository *PREFERRED*	Canadian - data are transferred and stored in Canada. Hosted by the Digital Research Alliance of Canada. any Principal Investigator or their designate affiliated with a Canadian academic institution can deposit data at no direct cost can house datasets of any size; 1 Tb default free storage, more available on request Not suitable for restricted data; data must be made publicly available under a CC license, though embargo periods are supported published datasets are assigned a persistent identifier (DOI - Digital Object Identifier) to easily find and cite the data all datasets are appraised for long-term preservation research librarians from the Canadian Association of Research Libraries (CARL) curate and approve deposited items
Borealis Dataverse	Deposit requires institutional subscription. KPU is not yet a member, but is investigating. Canadian - data are transferred and stored in Canada. Hosted by U of T Library. storage allocation based on institution size, minimum 1 Tb/institution 2.5 Gb file size limit default data license is CC0, but can be changed by depositor published datasets are assigned a DOI
Harvard Dataverse	hosted by Harvard University free storage up to 1 Tb 2.5 Gb file size limit default data license is CC0, but can be changed by depositor published datasets are assigned a DOI
Dryad	operated by a U.S. nonprofit membership organization open-source and open access requires an ORCID or institutional membership $120 USD Data Publishing Charge (DPC) per submission unless the submitter is based at a member institution, an associated journal or publisher has an agreement with Dryad to sponsor the DPC, or the submitter is based in a fee-waiver country deposited data are preserved in Merritt, a CoreTrustSeal certified repository maintained by the California Digital Library datasets must be published under CC0 license storage up to 300 Gb per data publication (more available on request); 10Gb file size limit published datasets are assigned a DOI
Figshare	open access 20GB free storage per account Figshare+ available for larger datasets (Data Publishing Charge (DPC) per submission) published datasets are assigned a DOI
Open Science Framework Storage	hosted by the Center for Open Science open source data stored on Google Cloud; can be configured to store data in Canada (Montréal) supports public and private projects free storage up to 50 Gb for public projects and 5 Gb for private projects; 5 Gb individual file size limit publicly registered datasets are assigned a DOI
Zenodo	hosted by CERN. accepts research outputs from all fields of research open source free up to 50 Gb per dataset published datasets are assigned a DOI research output is stored safely for the future in the same infrastructure as CERN's own Large Hadron Collider research data

Disciplinary Repositories

OpenICPSR

social, behavioural, and health sciences data

free to deposit and access
storage up to 30Gb and 1000 files per deposit

Qualitative Data Repository (QDR)

social sciences

QDR curates, stores, preserves, publishes, and enables the download of digital data generated through qualitative and multi-method research in the social sciences. (from homepage) Data depositors may be charged a fee which helps cover the costs of curation and preservation.

IEEEDataPort

engineering

Accepts data sets up to 2TB. Designed to perform four functions:

Enable individuals and institutions to indefinitely store and make datasets easily accessible to a broad set of researchers, engineers and industry;
Enable researchers, engineers and industry to gain access to datasets that can be analyzed to advance technology;
Facilitate data analysis by enabling access to data in the AWS Cloud and by enabling the downloading of datasets
Supports reproducible research.(from homepage)

Discipline-specific repositories are also known as subject-specific or domain-specific repositories. Many areas of research are supported by discipline-specific repositories hosted by a variety of internationals groups.

Scholarly journals are increasingly requiring the sharing of associated research data as a condition of article publication. Many publishers require a Data Availability Statement to be included with submissions, and some either recommend or require that data be deposited in a publicly-accessible data repository.

You often can find data sharing policies in the “Instruction for Authors” or “Author Guidelines” sections of the journal.

Examples of author guidelines with a data sharing requirement:

GigaScience

"GigaScience requires authors to deposit the data set(s) supporting the results reported in submitted manuscripts in a publicly-accessible data repository. ... This section should be included when supporting data are available and must include the name of the repository and the permanent identifier or accession number and persistent hyperlinks for the data sets (if appropriate). The following format is recommended:

"The data set(s) supporting the results of this article is(are) available in the [repository name] repository, [cite unique persistent identifier]."" (from GigaScience Instruction for Authors).

Canadian Journal of Fisheries and Aquatic Sciences

"Supply a data availability statement that says whether any, all, or portions of the data underpinning the work are available to others.

If data are available, specify how data can be accessed and under what conditions data can be reused. Supply repository name, persistent unique identifier (PID: DOI/compact identifier/accession number), and web link.
If data are not available, explain why (e.g., describe the ethical, legal, or commercial restrictions)."

(From Canadian Science Publishing Author Guidelines)

"For primary biodiversity data authors are strongly encouraged to place all species distribution records in a publicly accessible database such as the national Global Biodiversity Information Facility (GBIF) nodes (www.gbif.org) or data centres endorsed by GBIF, including BioFresh (www.freshwaterbiodiversity.eu) for freshwater data and the Ocean Biogeographic Information System (OBIS, http://www.obis.org/) for marine biodiversity data, which also holds supporting measurements taken alongside the species occurrence data." (from Canadian Journal of Fisheries and Aquatic Sciences Scope of the Journal and Guidelines for Papers).

Major Publisher Data Policies

Publisher Recommended Repositories

Some scholarly publishers provide resources to help submitters choose a data repository for deposit, including recommending specific repositories that meet their data sharing requirements. For example:

SpringerNature repository guidance
PLoS recommended repositories

When sharing research that involves sensitive data, protecting confidentiality is critical. Sensitive data includes personally identifiable information from human participants, as well as confidential industry data

Research that includes the collection of sensitive data, such as some health research, must ensure that privacy is protected, and that confidentiality extends to data deposit. Creating a Data Management Plan (DMP) at the start of the project is highly recommended.
Researchers who plan to deposit research data collected from human participants must ensure that those plans are included in their research ethics application.
Informed consent documents must include a separate provision for data sharing. Consent to collection of data for a specific study does not imply consent to sharing of that data for reuse.
Prior to undertaking the research, determine if the data will need to be de-identified or anonymized. This task can be time consuming, may affect the research project timelines, and may affect the budget.
When choosing a repository for depositing sensitive data, the security measures, confidentiality review, and access controls offered will be a key consideration.

Data citation is a standardized method for secondary users of data to provide credit to the data's creator, just like citing books or journal articles referenced in a research paper. Researchers are encouraged both to cite the data of other researchers, and create data citations for their own datasets to increase the likelihood that their data will be cited.

The benefits of data citation include:

the acknowledgement of research data as a valuable contribution to the scientific record
the ability to reference data as standalone research output
providing appropriate credit to data creators
supporting reproducibility and verification of study results
enabling reuse of data for future research
allowing researchers to track the usage and impact of their data

The Joint Declaration of Data Citation Principles has established eight principles to guide the development of data citations. These principles are: importance, credit and attribution, evidence, unique identification, access, persistence, specificity and verifiability, and interoperability and flexibility.

There are various ways to format a data citation, but the most important components are: creator, publication date, title, publisher, and identifier (e.g. DOI or Handle, preferably formatted as a URL). A user should be able to find the exact same dataset using the information given in the citation.

A basic, widely-accepted citation format:

Creator (PublicationYear): Title. Publisher. Identifier

Example:

Bowlby, Heather; Gibson, Jamie. 2021. Implications of life history uncertainty when evaluating status in the Northwest Atlantic population of white shark (Carcharodon carcharias). Dryad. https://doi.org/10.5061/dryad.vhhmgqnqk

A more comprehensive citation may include additional elements, most commonly: version, ResourceType (in this case usually [dataset]), date accessed (if no version is given but the dataset is likely to change over time), subset identifier/query PID.

Creator (PublicationYear): Title. Version. ResourceType. Publisher. Identifier. Date Accessed

Example:

Cohen, M. A., & Miller, T. R. (1991). Cost of mental health care for victims of crime in the United States (ICPSR 6581). Version V1. [Data set]. ICPSR. https://doi.org/10.3886/ICPSR06581.v1. Accessed 2022-09-07

Data repositories (e.g. Dataverse, Dryad, ICPSR) often provide a default data citation for each deposited dataset, in a variety of different styles. Further, traditional citation styles are starting to prescribe standards for citing data. For example, APA has a template for citing published data, and one for citing unpublished raw data.

Metadata is used to describe data so that other researchers can find it and use it appropriately. There are different metadata standards to choose from depending on your area of research. Some examples of metadata used for a research dataset include:

title of the dataset
creator(s)
date (created or published)
method used to generate the data
source of the data
terms to describe the content
technical descriptions including file names, formats, versions, etc.
access information such as license and download links or data request forms

Support services for your chosen repository should be able to assist with determining what metadata to record. The most common metadata standards used for data management are Dublin Core and the DDI (Data Documentation Initiative).

Some disciplines have their own metadata schemas. Each schema has its own specified elements and structure.The Metadata Standards Catalog is a collaborative, open directory of metadata standards applicable to research data.

Here are some examples from Curtin University Library.

Discipline	Metadata standard
General	Dublin Core (DC) Metadata Object Description Schema (MODS) Metadata Encoding and Transmission Standard (METS)
Arts	Categories for the Description of Works of Art (CDWA) Visual Resources Association (VRA Core)
Astronomy	Astronomy Visualization Metadata (AVM)
Biology	Darwin Core
Ecology	Ecological Metadata Language (EML)
Geographic	Content Standard for Digital Geospatial Metadata (CSDGM)
Social sciences	Data Documentation Initiative (DDI)

KPU Library

Research Data Management (RDM)

Acknowledgment