Please note, these guidelines are relevant to all of our journals. Make sure that you check your chosen journal’s web pages for specific guidelines too.
We want our authors and readers to trust the research that is published in our journals. To that end, we support the entire community to achieve best practice, in both the sharing and archiving of research data.
For guidance on how to present experimental results included within your article, please see our experimental reporting requirements page.
Why is data sharing important?
Data sharing is central to improving many aspects of research culture.
- It supports the validation of data to maintain high standards of research reproducibility
- It increases transparency and encourages trust in the scientific process
- It enables and encourages the reuse of new findings
- The formal citation of data ensures all researchers involved in producing the data can gain credit for their research outputs
- It may also be a formal requirement placed on researchers by funders or institutions
What is research data?
Research data generally refers to the results of observations or experiments that validate your research findings. It forms part of a wider group of useful materials associated with your research project, including but not limited to:
- raw or processed data and metadata files, e.g. spectra, images, structure files
- software and code, including software settings
- models
- algorithms
Research data typically refers to digital, machine-readable files and we encourage authors to make data available in standard formats that can be opened and re-used by others.
Our data sharing policy
The ¾ÅÖÝÓ°Ôº believes that where possible, all data associated with the research in a manuscript should be Findable, Accessible, Interoperable and Reusable (FAIR), enabling other researchers to replicate and build on that research.
We strongly encourage authors to deposit the data underpinning their research in appropriate repositories.
For all submissions to ¾ÅÖÝÓ°Ôº journals, any data required to understand and verify the research in an article must be made available on submission. To comply, we suggest authors deposit their data in an appropriate repository. Where this isn’t possible, we ask authors to include the data as part of the article Supplementary Information.
Some journals may have additional subject requirements for both sharing and/or publishing supporting data, so please ensure you check the journal specific guidelines.
Recommended repositories
A data repository is an external storage space for researchers to deposit datasets associated with their research. Data should be submitted to a discipline-specific, community-recognised repository where possible, or alternatively to an institutional repository, or a generalist repository if no subject discipline repository is available for the given data type.
The choice of repository is the author’s decision, provided it is in line with institutional or funder guidelines. The exception to this is small molecule crystal data, which must be deposited with the Cambridge Crystallographic Data Centre (CCDC).
Choosing a repository
The ¾ÅÖÝÓ°Ôº supports the – Transparency, Responsibility, User focus, Sustainability and Technology – for repository selection. We strongly encourage the use of repositories offering persistent identifiers, such as DOIs, for deposited datasets. These help to make robust connections between datasets and papers, e.g. via the inclusion of a Data Availability Statement.
Authors should also consult the following resources to make selections:
- RSC guidance for repositories for specific data types
- Institutional guidelines
- Please consult your subject librarian or research data support service for any local guidance, e.g. on depositing data within institutional data repositories
- Funder guidelines
- Please consult your funder on specific compliance requirements, such as to the creation of research data management plans (DMPs)
- Repositories of repositories
- The following websites may help for searching and selecting subject specific repositories.
- The following websites may help for searching and selecting subject specific repositories.
Subject specific repositories
The ¾ÅÖÝÓ°Ôº encourages the use, where possible, of subject specific rather than general repositories, and recommendations by data type are given below.
Where deposition in a particular repository is required for submission, this is indicated in the table below.
Data type | Repository | URL | File / standard |
---|---|---|---|
Crystal structure (organic / organometallic / metal organic) Required for all RSC journals |
Cambridge Structural Database (CSD) – managed by the Cambridge Crystallographic Data Centre (CCDC) |
Crystallographic information file (.cif) |
|
Crystal structure (biological) |
Protein Data Bank (PDB) |
Macromolecular CIF (mmcif) |
|
Crystal structure (inorganic) |
Inorganic Crystal Structure Database (ICSD), deposition via CCDC |
Crystallographic information file (.cif) |
|
Crystal structure (powder) |
Either The International Centre for Diffraction Data (ICDD) or Cambridge Structural Database (CSD) |
Powder Diffraction File (PDF) |
|
CryoEM |
Electron Microscopy Data Bank
|
MRC file |
|
bio-NMR |
Biological Magnetic Resonance Data Bank (BMRB) |
NMR Self-defining Text Archival and Retrieval (NMR-STAR), format conversion tools available |
Data type | Repository | URL | File / standard |
---|---|---|---|
Software / code Please also refer to ‘Software and Code’ guidelines for Data Availability Statements and Data citations |
GitHub |
Please also consider archiving code in combination with a repository that can issue a DOI [] |
|
Software / code Please also refer to ‘Software and Code’ guidelines for Data Availability Statements and Data citations |
Code Ocean |
|
|
Models of biochemical reaction networks |
BioModels database |
include SMBL and PharmML.
|
Data type | Repository | URL | File / standard |
---|---|---|---|
Atomic coordinates |
Nonspecific / consider general or institutional repository |
|
We recommend the use of any standard structure file, such as xyz, cif, pdb; or a text file with structure in Cartesian, fractional, z-matrix or other common representation. |
Input/configuration files and program output |
Nonspecific / consider general or institutional repository |
|
We recommend sharing the standard input and output formats generated by the simulation software. |
Materials simulation data including electronic structure and molecular dynamics |
NOMAD |
See repository guidelines |
|
Computational materials science |
Materials Cloud |
See for guidance |
|
Computational chemistry files |
ioChem-BD - The Computational ¾ÅÖÝÓ°Ôº Results Repository |
See documentation |
Data type | Repository | URL | File / standard |
---|---|---|---|
NMR |
Nonspecific / consider general or institutional repository |
|
There is no single, widely accepted data standard. We encourage the deposition of a zip file of the raw instrument data (the entire file directory for the experiment, including the FID and associated files). Processed spectra may also be included. |
IR / Raman |
Nonspecific / consider general or institutional repository |
|
.csv, xlsx, or other machine-readable format |
UV-vis |
Nonspecific / consider general or institutional repository |
|
.csv, xlsx, or other machine-readable format |
EPR |
Nonspecific / consider general or institutional repository |
|
.dsc, .dta |
bio-NMR
|
Biological Magnetic Resonance Data Bank (BMRB) |
NMR Self-defining Text Archival and Retrieval (NMR-STAR), format conversion tools available |
|
Mass spectral data for small chemical molecules, metabolomics, exposomics |
MassBank |
See contributor guidance |
Data type | Repository | URL | File / standard |
---|---|---|---|
Electrophoretic gels and blots |
Nonspecific / consider general or institutional repository |
|
Please deposit raw, unedited files in a high-resolution image format (e.g.tiff) |
Microscopy (e.g. SEM, TEM, STM) |
Nonspecific / consider general or institutional repository |
|
Please deposit raw, unedited files in a high-resolution image format (e.g.tiff) |
Coherent X-ray images |
Coherent X-ray Imaging Data Bank (CXIDB) |
CXI file (see repository ) |
|
Bioimages, multidimensional life sciences image data (cell and tissue) |
Image Data Resource (IDR) |
For supported formats see |
Data type | Repository | URL | File / standard |
---|---|---|---|
Materials (Various) |
Materials Data Facility |
Multiple – see |
|
Materials simulation data including electronic structure and molecular dynamics |
NOMAD |
See repository guidelines |
|
Computational materials science |
Materials Cloud |
See for guidance |
Data type | Repository | URL | File/standard |
---|---|---|---|
All proteomics data |
Any ProteomeXchange member |
See relevant target repository |
|
Proteomics mass spectrometry |
PRIDE (Proteomics Identification Database) |
Multiple - see |
|
Human geno- and phenotype data, epigenetics |
Database of Genotypes and Phenotypes (dbGaP) |
Multiple - get from dbGaP |
|
Human genetic variation data (<=50bp), e.g.single-base nucleotide substitutions, small-scale deletion or insertions |
dbSNP |
Multiple – get from dbSNP |
|
Human genomic structural variation data (>50bp), e.g. insertions, deletions, translocations |
Database of Genomic Structural Variation (dbVAR) |
Excel and VCF files – get from dbVAR |
|
Genetic variation data (all species) |
European Variation Archive (EVA) |
VCF files – get from the EVA |
|
Gene expression data, array- and sequence-based |
Gene Expression Omnibus |
See repository |
|
High-throughput functional genomics data |
ArrayExpress |
See repository |
|
Protein-protein, protein-DNA/RNA and molecular interactions |
IntAct molecular interaction database (IntAct) |
Multiple – get from IMEx Consortium |
|
miRNA sequences and annotation |
miRBase: the microRNA database |
Multiple – get from miRBase |
|
Metabolomics |
MetaboLights |
See repository |
|
Metabolomics |
Metabolomics Workbench |
See repository |
Data type | Repository | URL | File / standard |
---|---|---|---|
DNA & RNA sequence data |
Any INSDC repository member |
|
|
Genome sequence data |
Genome Sequence Archive (GSA) |
Multiple - get GSA on suitable types |
|
Metagenomics sequence data |
MGnify |
MGnify on sequence data |
|
Protein sequences |
Universal Protein Resource (UniProt) |
Data type | Repository | URL | File / standard |
---|---|---|---|
Atmospheric and earth observation research, environmental data |
CEDA Archive (Centre for Environmental Data Analysis) |
See repository |
|
Environmental and ecological data |
Environmental Data Initiative (EDI) |
See EDI |
|
Geochemical, geochronological, and petrological data |
EarthChem |
See EarthChem |
|
Climate or Earth system research, climate model data |
World Data Center for Climate (WDCC) |
See WDCC for |
Data type | Repository | URL | File / standard |
---|---|---|---|
Functional enzymology data (kinetic and experimental data) |
Standards for Reporting Enzymology Data (STRENDA DB) |
See STRENDA |
|
Flow cytometry data |
FlowRepository |
See FlowRepository |
|
Protein circular dichroism and protein synchrotron radiation circular dichroism |
Protein Circular Dichroism Data Bank (PCDDB) |
See repository |
Data type | Repository | URL | File / standard |
---|---|---|---|
Intermolecular and supramolecular interactions of molecular systems, binding, assembly, and interaction phenomena |
SupraBank |
JSON (DataCite), CDX (for 2D/3D molecule structure), PNG, proprietary formats |
General repositories
Where subject specific or institutional/funder repositories are not available, authors may wish to choose a general repository, such as:
Repository Name | Information on costs | URL |
---|---|---|
Dryad Digital Repository |
Fees apply |
|
figshare |
Fees apply |
|
Harvard Dataverse |
Contact repository for datasets over 1 TB |
|
Open Science Framework |
Free of charge |
|
Science Data Bank |
Free of charge |
|
Zenodo |
Donations towards sustainability encouraged |
|
Chemotion |
Free of charge |
Data availability statements
To maintain high standards of transparency, research reproducibility, and to promote the reuse of new findings, a data availability statement (DAS) is required to be submitted alongside all articles.
Data availability statements provide information about where data, software, or code supporting the results reported in a published article can be found. These should include, where applicable, links to datasets shared in an external data repository, which have been analysed or generated during the study. This section should list the database, accession number, DOI, URL or any other relevant details. The full URL link to data sets should be provided (not embedded behind text). Authors are also encouraged to include data citations to associated datasets in the reference section of an article.
The data availability statement can provide information about the data presented in an article (e.g., in Figures or Tables) or provide a reason if data are not available to access (e.g. human health data). If supporting data or code have been included in the article’s Supplementary Information, this should also be stated here.
If data for the article cannot be made available, for example, due to legal or ethical confidentiality requirements, then the DAS should state this.
A data availability statement must be included at the end of the article under the heading “Data availability”, after the conflicts of interest statement and before any acknowledgements.
The following are some examples of DAS that you can use:
- Data for this article, including [description of data types] are available at [name of repository] at [URL – format https://doi.org/DOI].
- The data supporting this article have been included as part of the Supplementary Information.
- Crystallographic data for [compound number] has been deposited at the [name of repository, such as CCDC / ICSD / PBD] under [accession number] and can be obtained from [URL of data record, format https://doi.org/DOI].
- The code for [description of software] can be found at [URL to code location] with [DOI – see guidelines below for citing software and code]. The version of the code employed for this study is version [XXX].
- This study was carried out using publicly available data from [name of repository] at [URL] with [accession number].
- The data analysis scripts of this article are available in the interactive notebook [name of notebook, e.g. Google Collab] at [URL].
- Data for this article are available at [name of repository] at [URL – format ]. Data collected from human participants, described in [Fig. X], are not available for confidentiality reasons.
- No primary research results, software or code have been included and no new data were generated or analysed as part of this review.
The following statement is generally not acceptable “Data are available upon request from the authors".
Data and software citation
Citing datasets and code ensures effective and robust research dissemination. We strongly encourage ¾ÅÖÝÓ°Ôº authors to formally cite associated datasets as bibliographic references.
Doing this will:
- help readers to discover your data
- allow funders to easily link to articles and data associated with science they support
- provide formal credit to repositories and data creators.
Citing data
For author-generated datasets that are directly associated with the article:
We encourage authors to add data citations as bibliographic references within Data Availability Statements, alongside the information on datasets associated with the study and where to find them.
For other datasets associated with previous studies:
We encourage authors to add data citations as bibliographic references within the main text as they are mentioned. Data citation is encouraged as an alternative to informal references or mentions of local identifiers.
Suggested reference format for data citations:
[Name of data creators, format: A. Name, B. Name and C. Name], [Year], [Name of repository / type of dataset: deposition number], [DOI, or URL if not available, of the dataset].
Example
P. Cui, D. P. McMahon, P. R. Spackman, B. M. Alston, M. A. Little, G. M. Day and A. I. Cooper, 2019, CCDC Experimental Crystal Structure Determination: 1915306, DOI: 10.5517/ccdc.csd.cc22912j
Please also refer to the guidelines from the relevant repository on which information to provide in a citation.
Citing software and code
We encourage authors to add formal bibliographic references for software and code associated with their articles in Data Availability Statements and/or to directly credit use of other software and code by adding citations to the main text of their article at the relevant point.
Authors are asked to provide the names of all code creators in the reference, the name of the repository, and a DOI, although a URL can be provided if a DOI is not available. We strongly recommend you use (the Contributor Roles Taxonomy from CASRAI) for standardised contribution descriptions.
.
Please cite the specific release where possible – .
Suggested reference format for code citations:
[Name of code creators, format: A. Name, B. Name and C. Name], [Year], [Name of code repository / type of code], [DOI, or URL if not available – in the instance where code has been deposited in GitHub and Zenodo, as per the guidelines above, the Zenodo DOI is preferred for bibliographic references]