Data Release Description

Reporting standards

GigaByte has long supported efforts promoting the use of reporting guidelines (see more in our editorial, FAIRsharing recommendations, and our Editorial Policies and Reporting Standards page). 

Criteria

Data Release articles highlight and help to contextualize exceptional and openly available datasets to encourage reuse. All data can be linked to the Data Release via our GigaDB repository (GigaDB). Data Release focus on a particular dataset, and provide detailed methodology on data production, validation, and potential reuse. Supporting the FAIR Principles for scientific data management and stewardship that state that research data should be Findable, Accessible, Interoperable and Reusable. Manuscripts containing more detailed biological, medical or technical analyses of data may be more suitable for our sister publication GigaScience Journal which publishes Research Articles. One of the aims of a Data Release is to incentivize and more rapidly release data before subsequent detailed analysis has been carried out. We do also publish Data Release articles in coordination with or after the publication of an analysis paper, but we expect the Data Release to add value, especially in cases where the analysis paper has already been published. Data Releases should include additional detail that might not have been appropriate in the research paper, including information on data collection, detailed data validation, and information on exactly how these data can be re-used. The Data Release can also include the release of intermediate or processed data, code and other supporting information that would enable reproduction that was not released with the companion paper.

Editorial assessment will take into account the following, balancing the relative importance with state-of-the-art and community need:

  • How the data meets the FAIR (Findable, Accessible, Interoperable and Reusable) principles
  • Reuse potential.
  • Data that are well documented, include extensive metadata, provide unique information allowing novel uses of data, and add value to similar data already available.

We recommend authors provide brief highlights of these points in their cover letter.

Note that we do not consider Data Release articles for data from published papers where the data should have been released at the time of publication, this is specifically for papers that have been published more than 1 year ago.

We will consider Data Release articles where the authors made the data available in all ways possible at the time of publication of the analysis paper, but technology or lack of data storage at the time made complete sharing impossible or the paper provides added value (e.g. additional related datasets or other things to increase re-use).

Manuscript File formats

The following word processor file formats are acceptable for the main manuscript document:

  • Microsoft word (DOC, DOCX)
  • Portable document format (PDF)
  • TeX/LaTeX (use our Overleaf template)

For creating manuscripts in LaTeX, GigaByte recommends the use of its own LaTeX class files. Our class files are available online at Overleaf and also as a downloadable package via the links below.

Overleaf is a free, collaborative online LaTeX editor that allows you to write your manuscript in a TeX or rich text environment, to generate PDF outputs as you write, and to share your manuscript with co-authors and collaborators. When ready for submission, the manuscript files can be downloaded from Overleaf and submitted to the journal’s online submission system. Please use the following Overleaf Data Release Template.

Data Access Requirements and Submission

The dataset(s) described in the Data Release must be available for our reviewers to assess along with the manuscript. This can be done either on your own website, an open repository or temporarily in the private data hosting region in our GigaDB repository.

Should the manuscript be approved for publication, the accompanying datasets must be accessible by any researcher wishing to use them under a Creative Commons CC0 waiver, without restrictions, such as the need for a material transfer agreement. Authors must clearly acknowledge any work upon which they are building, both published and unpublished, and cite such work when possible in the reference list with a Digital Object Identifier (DOI).

GigaByte is able to host very large datasets of all types (there is no Data Processing Charge for datasets below 1TB. Contact the editors if you require more). The journal provides a direct link from the published manuscripts to the journal’s affiliated database, GigaDB. On acceptance for publication, the datasets linked by GigaDB are given a DOI, which allows easy and more permanent access to the data, provides a format suitable for citing data in the reference section of future articles that use these data, and enables data tracking and credit*.

If you have sensitive data only appropriate for controlled-access databases (e.g. medical data), data deposition in GigaDB is not appropriate; however, these data must be available for peer review and be deposited in a suitable controlled-access database upon publication. We will provide citable DOI’s for these data by creating a GigaDB dataset with links to the controlled-access data repositories and host information and request forms for researchers wishing to access this data. For human data that is consented to be openly shared, you may be asked to include a blank unsigned version of the consent form alongside the data. Please also confirm that you have followed all national guidelines on data collection and release in the country the research was carried out, for example confirming you have Ministry of Science and Technology (MOST) approval in China. Please contact us at database@gigasciencejournal.com, and we will be happy to answer any questions or provide advice.

For data submission with your manuscript: If your manuscript is considered suitable for review, after an initial editorial assessment for scope and scale, a member of the GigaDB database team will contact you to help make your data available to the reviewers.

At data submission, you need to include information about the data, not just submission of the data alone. For this, please provide our curators the following information about the data:

  1. A title that is specific to the data itself. (Typically such titles include a phrase like "The data for…").
  2. Abstract- a brief description of the dataset(s) included and their potential use for the scientific community.
  3. Author list for the dataset (This can differ from that of the submitted manuscript).
  4. Data types (e.g., sequence assembly, transcriptome, imaging, movies, etc.).
  5. Organism(s) and/or tissue(s) for each data type.
  6. Estimate of dataset size.
  7. Readme file: This file should contain information about the files (to be) hosted in a dataset, including any file naming convention used and the directory structure of any compressed archive (.tar) files. For more general information about Readme Files, the following links may be useful: http://en.wikipedia.org/wiki/README and http://www.wikihow.com/Write-a-Read-Me.
  8. Link(s) to any relevant data that is publicly available, or any related accession numbers in other repositories (Note that any data being submitted to GigaByte that has a community-approved data repository must be submitted there. We will be happy to help you with this process.)
  9. Acknowledgements, if any, and be sure to include a list of grants and funding agencies and information on (URL links to) consortium or projects if there are any associated with these data.

*GigaDB is tracked by the Web of Science Data Citation Index.

Preparing main manuscript text

Template/Structure

Title page
This should list: the title of the article, which should include an accurate, clear and concise description of the reported work, avoiding abbreviations; and the full names, institutional addresses, e-mail addresses and ORCID ID for all authors. The corresponding author should also be indicated. Should the authors of the work prefer, they can also designate authorship as a Consortium or Project. In such a case, a contact author must still be provided, and a complete author list/institutions/emails/etc should be included in the Authors' Contribution section.

Abstract
Recommended 150 words 
The abstract of the manuscript should cover a presentation of the interest or relevance of these data for the broader community; a very brief preview of the data type(s) produced, the methods used, and information relevant to data validation. As well as the potential uses of these data and implications for the field. Please minimize the use of abbreviations and do not cite references in the abstract. As this article type is focused on describing a dataset, conclusions or interpretive insights are not required.

Research Area (choose one from list of 10)
Classifications (choose two from list of 90)

Data Description

A brief introductory statement providing background and purpose for collection of this data should be presented for a non-specialist audience. A clear, concise, description of the data, the protocol(s) for data collection, data curation and quality control, as well as potential uses should then follow.

We would recommend including the following detail and breaking up the manuscript text into the following sections:

Context
Please state what motivated you to collect this data, explaining why these data are of value to the scientific community, and giving some background on the potential interest of this data. Relating this work to previous studies and any related public datasets can help in understanding its utility and potential for reuse.

Methods
This section should outline experimental design and can potentially be further broken up into subheadings a) Sampling strategy and b) Steps. Please include details on any specimen sampled, including any other species, strain or cell line identifiers, and any voucher or biobank numbers for where the specimen is stored. And a brief description of which conditions and parameters were used for data production and processing (normalization, feature extraction, etc.). Related methods can also be grouped under corresponding subheadings. This section should provide enough detail to allow other researchers to interpret and repeat the study. In addition to including the design of the study, it needs to provide detail on the type of reagents, instrumentation and kits used (including any identifiers such as RRIDs), and follow best practice for reporting guidelines and data standards (including those listed in our FAIRsharing page).

Please include and cite the URLs (or DOIs if snapshots have been archived in a repository such as Zenodo or GigaDB) to any publically available bioinformatics tools that are used in the production of the data, including the exact version used. We strongly encourage the use of workflow management systems such as Galaxy and myExperiment, and container systems such as Docker, to save the details of the methods to encourage reproducibility as well as conciseness. Our GigaDB repository and GigaGalaxy server can also be used to archive data, workflows and snapshots of the code with an accompanying DOI. Our GitHub page can be used to host a dynamic forkable version of the code if the authors have not used a code repository themselves. We also offer integration with Code Ocean, a cloud-based computational reproducibility platform.

GigaByte encourages and assists with the submission of detailed protocols to the open access repository protocols.io. Please enter the details into protocols.io, issue a DOI, and cite the protocols.io record from the Methods section.

Authors benefit greatly by posting their methods in protocols.io as these are in a formatted form, allow inclusion of all the details, are fully searchable unlike supplementary files, and can be updated to new versions as basic methodology changes over time. Doing this saves authors extensive time in the future as the methods do not need to be rewritten in future manuscripts as they need only be cited.

Data Validation and quality control
Experiments supporting and demonstrating the technical quality of the dataset should be included in the manuscript, alongside any quality metrics. Data Releases can be published alongside traditional Research Articles, and any previous or in-press research using the data should be highlighted, as it will aid validation and demonstrate use.

Re-use potential
Instructions and suggestions of downstream applications that may help other researchers with reuse of the data are required. This section can also promote discussion on possible ways the data presented might be used in or have a relationship with other areas of research that may not be directly apparent in the work.

Availability of source code and requirements (if used in the paper)

List the following:

Project name: e.g. My bioinformatics project
Project home page: e.g. http://sourceforge.net/projects/mged
Operating system(s): e.g. Platform independent
Programming language: e.g. Java
Other requirements: e.g. Java 1.3.1 or higher, Tomcat 4.0 or higher
License: e.g. GNU GPL, FreeBSD etc.
RRID: if applicable, e.g. RRID: SCR_014986

This needs to be under an Open Source Initiative approved license where practicable compiled running software is made available.

Data Availability Statement

GigaByte requires authors to deposit the datasets supporting the results reported in submitted manuscripts in a publicly-accessible data repository such as GigaDB (see GigaDB database terms of use for complete details). This section should be included when supporting data are available and must include the name of the repository and the permanent identifier or accession number and persistent hyperlinks for the data sets (if appropriate). 

Submission of data into GigaDB does NOT serve as a substitution for submission of data to community-mandated databases. For a complete list of community available databases please see the Reporting Standards section of our Editorial Policies.

Following the Joint Declaration of Data Citation Principles, where appropriate we ask that the data sets be cited where it is first mentioned in the manuscript, and included in the reference list. If a DOI has been issued to a dataset please always cite it using the DOI rather than the less stable URL the DOI resolves to (e.g., http://dx.doi.org/10.5524/100044 rather than http://gigadb.org/dataset/100044). For more see:

Data Citation Synthesis Group: Joint Declaration of Data Citation Principles. Martone M. (ed.) San Diego CA: FORCE11; 2014 [https://www.force11.org/datacitation]

A list of available scientific research data repositories can be found in res3data and FAIRsharing.

Declarations

List of abbreviations

If abbreviations are used in the text they should be defined in the text at first use, and a list of abbreviations should be provided in alphabetical order.

Ethics approval and consent to participate

Manuscripts reporting studies involving human participants, human data or human tissue must:

  • include a statement on ethics approval and consent (even where the need for approval was waived)
  • include the name of the ethics committee that approved the study and the committee’s reference number if appropriate

Studies involving animals must include a statement on ethics approval and have been treated in a humane manner in line with the ARRIVE guidelines.

See our editorial policies for more information.

If your manuscript does not report on or involve the use of any animal or human data or tissue, this section is not applicable to your submission. Please state “Not applicable” in this section.

Consent for publication

If your manuscript contains any individual person’s data in any form, consent to publish must be obtained from that person, or in the case of children, their parent or legal guardian. All presentations of case reports must have consent to publish. You can use your institutional consent form. You should not send the form to us on submission, but we may request to see a copy at any stage (including after publication). Please also confirm you have followed national guidelines on data collection and release in the place the research was carried out, for example confirming you have Ministry of Science and Technology (MOST) approval in China.

If your manuscript does not contain any individual persons data, please state “Not applicable” in this section.

Competing interests

All financial and non-financial competing interests must be declared in this section. See our editorial policies for a full explanation of competing interests. Where an author gives no competing interests, the listing will read 'The author(s) declare that they have no competing interests'. If you are unsure whether you or any of your co-authors have a competing interest please contact the editorial office.

Funding

All sources of funding for the research reported should be declared. The role of the funding body in the design of the study and collection, analysis, and interpretation of data and in writing the manuscript should be declared. Please use FundRef to report funding sources and include the award/grant number, and the name of the Principal Investigator of the grant.

Authors' contributions

The individual contributions of authors to the manuscript should be specified in this section. Guidance and criteria for authorship can be found in our editorial policies. We would recommend you follow some kind of standardised taxonomy like the NISO/CASRAI CRediT (Contributor Roles Taxonomy).

Acknowledgements

Please acknowledge anyone who contributed towards the article who does not meet the criteria for authorship including anyone who provided professional writing services or materials.

Authors should obtain permission to acknowledge from all those mentioned in the Acknowledgements section. If you do not have anyone to acknowledge, please write "Not applicable" in this section.

See our GigaScience Editorial for a full explanation of acknowledgements and authorship criteria.

Group authorship: if you would like the names of the individual members of a collaboration group to be searchable through their individual PubMed records, please ensure that the title of the collaboration group is included on the title page and in the submission system and also include collaborating author names as the last paragraph of the “Acknowledgements” section. Please add authors in the format First Name, Middle initial(s) (optional), Last Name. You can add institution or country information for each author if you wish, but this should be consistent across all authors.

Please note that individual names may not be present in the PubMed record at the time a published article is initially included in PubMed as it takes PubMed additional time to code this information.

Authors' information

You may choose to use this section to include any relevant information about the author(s) that may aid the reader's interpretation of the article, and understand the standpoint of the author(s). This may include details about the authors' qualifications, current positions they hold at institutions or societies, or any other relevant background information. Please refer to authors using their initials. Note this section should not be used to describe any competing interests.

Endnotes

Endnotes should be designated within the text using a superscript lowercase letter and all notes (along with their corresponding letter) should be included in the Endnotes section. Please format this section in a paragraph rather than a list.

References

All references, including URLs, must be numbered consecutively, in square brackets, in the order in which they are cited in the text, followed by any in tables or legends. GigaByte believes that data, software and protocols should be treated as legitimate, citable objects of research and accorded the same importance in the scholarly record as citations of other research objects, such as publications. Therefore we follow the guidelines of the Data Citation and Software Citation Principles.

GigaScience follows the recommendations of the FORCE11 Software Citation Implementation Working Group. Software citation elevates software to the level of a first-class object in the digital scholarly ecosystem, consistent with its immense significance. The software should be cited in the references and include the following information: Creator (the authors or project that developed the software); Title (the name of the software); Publication venue (ideally an archive or repository); Date (when the software was published), Version (if unknown the date of access should be used); and Identifier (a president identifier like a DOI or a URL to where the software exists). If an article exists that describes the software, it should be cited as an additional reference, as well as citing the software itself.

For more see: Katz, D. S., et al., (2020) Recognizing the value of software: a software citation guide. F1000 Research. https://doi.org/10.12688/f1000research.26932.2

Web links and URLs
All web links and URLs, including links to the authors' own websites, should be given a reference number and included in the reference list rather than within the text of the manuscript. They should be provided in full, including both the title of the site and the URL, as well as the date the site was accessed, in the following format:

The PANGAEA Database. https://pangaea.de/. Accessed 15 Jan 2015.

If an author or group of authors can clearly be associated with a web link (e.g. for blogs) they should be included in the reference.

Only articles and abstracts that have been published or are in press, or are available through public e-print/preprint servers, may be cited; unpublished abstracts, unpublished data and personal communications should not be included in the reference list, but may be included in the text and referred to as "unpublished observations" or "personal communications" giving the names of the involved researchers. Obtaining permission to quote personal communications and unpublished data from the cited colleagues is the responsibility of the author.

Examples of the GigaByte reference style are shown below.

Example reference style:

Article within a journal
Smith JJ. The world of science. Am J Sci. 1999;36:234-5.

Article within a journal (no page numbers)
Bradnam KR, Fass JN, Alexandrov A, Baranay P, Bechner M, Birol I, et al. Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species. Gigascience. 2013;2:10

Article within a journal by DOI
Slifka MK, Whitton JL. Clinical implications of dysregulated cytokine production. Dig J Mol Med. 2000; doi:10.1007/s801090000086.

Article within a journal supplement
Frumin AM, Nussbaum J, Esposito M. Functional asplenia: demonstration of splenic activity by bone marrow scan. Blood 1979;59 Suppl 1:26-32.

Book chapter, or an article within a book
Wyllie AH, Kerr JFR, Currie AR. Cell death: the significance of apoptosis. In: Bourne GH, Danielli JF, Jeon KW, editors. International review of cytology. London: Academic; 1980. p. 251-306.

OnlineFirst chapter in a series (without a volume designation but with a DOI)
Saito Y, Hyuga H. Rate equation approaches to amplification of enantiomeric excess and chiral symmetry breaking. Top Curr Chem. 2007. doi:10.1007/128_2006_108.

Complete book, authored
Blenkinsopp A, Paxton P. Symptoms in the pharmacy: a guide to the management of common illness. 3rd ed. Oxford: Blackwell Science; 1998.

Online document
Bloggs J. Title of subordinate document. In: The dictionary of substances and their effects. Royal Society of Chemistry. 2015. http://www.rsc.org/dose/title of subordinate document. Accessed 20 May 2015.

Online database
Healthwise Knowledgebase. US Pharmacopeia, Rockville. 1998. http://www.healthwise.org. Accessed 21 Sept 1998.

Supplementary material/private homepage
Bloggs J. Title of supplementary material. 2014. http://www.privatehomepage.com. Accessed 20 Sept 2015.

University site
Bloggs, J: Title of preprint. http://www.uni-heidelberg.de/mydata.html (2015). Accessed 15 Dec 2015.

FTP site
Bloggs, J: Trivial HTTP, RFC2169. ftp://ftp.isi.edu/in-notes/rfc2169.txt (2015). Accessed 17 Jan 2015.

Organization site
ISSN International Centre: The ISSN register. http://www.issn.org (2006). Accessed 18 Jan 2007.

Dataset with persistent identifier
Zheng L-Y, Guo X-S, He B, Sun L-J, Peng Y, Dong S-S, et al. Genome data from sweet and grain sorghum (Sorghum bicolor). GigaScience Database. 2011. http://dx.doi.org/10.5524/100012.

Software without persistent identifier but with version and identifier (URL)
SAMtools (2020). SAMtools (Version 1.1)
https://github.com/samtools/samtools/releases/tag/1.11  

Software with persistent identifier
Piccolo S, Hill K, Suh E, Dayton J. (2019, November 15). srp33/ShinyLearner: Gigascience (Version 1). Zenodo.
http://doi.org/10.5281/zenodo.3543724

Protocol with persistent identifier
Mofiz E, Holt D, Seemann T, Currie BJ, Fischer K, Papenfuss AT. Draft genome assembly using parasitic mite population NGS DNA sample from mites extracted from host wound environment. protocols.io. 2016 http://dx.doi.org/10.17504/protocols.io.exwbfpe.

Code Ocean integration with persistent identifier
Geib SM, Hall B, Derego T et al. Genome Annotation Generator NCBI for submission [Source Code] [Database]. CodeOcean. 2018; https://doi.org/10.24433/CO.fceb0521-a26d-441f-9fe0-bccc6a250fc9.

Certificate of executable computation with persistent identifier
Eglen SJ. (2020, February 18). CODECHECK Certificate 2020-001. Zenodo. http://doi.org/10.5281/zenodo.3674056

See Information for Authors for information on how to format figures, tables and additional files.

Submit your manuscript in ReView