Data Submission Information

General Data Submission Information

Once you have submitted your manuscript, you will be contacted by the GigaDB curators to aid you in data submission. Note that the editors will not send your manuscript to reviewers until the required data is available for the reviewers. Thus, it is good to have your data in the proper submission form at the time of manuscript submission. Below you will find the details on how to handle supplementary information, how to format and submit different types of data, naming conventions, etc. The GigaDB curators will be happy to help if you have any questions or concerns. Data and software also needs to be credited according to the Data Citation principles and the recommendations of the FORCE11 Software Citation Implementation Working Group.

Guidance for handling material commonly found in supplements

Since GigaByte journal does not accept traditional "supplemental files", below is information on the best way to handle that information either as a data file for hosting in GigaDB, or  you may need to think about how and IF any details you had intended to include in supplemental files should be represented. To help you with this we have prepared a list of things commonly found in supp files and how we would recommend they be represented;

Additional methods - These should be translated into an online methods tool such as protocols.io and cited in the main manuscript.

Additional details of analysis - These should be translated into an online methods tool such as protocols.io and cited in the main manuscript.

Diagrams of the experiment - You should consider the utility of these carefully, if they are required as part of the narrative of your manuscript they should be included in the manuscript. We would not expect to see these as data files in GigaDB, since they are not data that can be reused in any way. If a workflow is required, it should be translated into a machine readable workflow (e.g. in Common Workflow Language, CWL) and uploaded to GigaDB as part of the associated dataset.

Further discussion of results - You should consider the utility of these carefully, if they are required as part of the narrative of your manuscript they should be included in the manuscript. We would not expect to see these as data files in GigaDB, since they are not "data" that can be reused in any way.

Images of gels - The original images (un-manipulated and un-annotated) should be uploaded to GigaDB as part of the associated dataset. We would not normally expect to see "manuscript figure ready" versions of the manipulated images in GigaDB, if you believe they are required as part of the narrative then they should be included in the main manuscript.

Images (e.g of subjects/samples, microscopy slides, etc.) - The original images (un-manipulated and un-annotated) should be uploaded to GigaDB as part of the associated dataset. We would not normally expect to see "manuscript figure ready" versions of the manipulated images in GigaDB, however we understand that there may be some scenarios where the authors may wish to provide composite images such as multiple staining events overlaid, these should be included in GigaDB.

Lists of things -  You should consider the utility of these carefully, if they are required as part of the narrative of your manuscript they should be included in the manuscript. If they are large lists of results or input data/accessions they should be converted to a machine readable format and uploaded to GigaDB as part of the associated dataset.

Lists of PCR primers - These should be included in the relevant experimental methods and submitted to online methods platforms, such as protocols.io and cited in the main manuscript.

Lists of BLAST hits - The full results files should be uploaded to GigaDB as machine readable data files as part of the associated dataset. If a summary of results is required you should consider the utility of these carefully, if they are essential as part of the narrative of your manuscript they should be included in the manuscript, no summary file should be included in the associated dataset.

Raw data files - These should be submitted to recognized community repositories if available, otherwise they should be submitted to GigaDB as part of the manuscript’s associated dataset.

Scripts and/or bespoke analysis code - If appropriate these could be uploaded to a code repository such as GitHub and cited in the manuscript, otherwise they should be submitted to GigaDB as part of the associated dataset.

Summary Tables of analysis results - You should consider the utility of these carefully, if they are required as part of the narrative of your manuscript they should be included in the manuscript. We would not expect to see these as data files in GigaDB, since they can be re-created from the actual data files.

Tables of analysis results - These should be converted to a machine readable format and uploaded to GigaDB as part of the associated dataset.  (A note on converting Excel tables to CSV/TSV: be aware of date formats and merged cells)

Visualizations of data* -  You should consider the utility of these carefully, if they are required as part of the narrative of your manuscript they should be included in the manuscript. We would not expect to see these as data files in GigaDB, since they are not "data" that can be reused in any way. However, we would expect to see the data files that are being visualized (unless they are already hosted in external stable repositories such as INSDC). As we can embed iframes and other interactive content in the full text version of our papers if there are visualization tools that can be integrated as dynamic content these can be used to display the data, and a screenshot should be submitted alongside the paper as a thumbnail for the PDF. 

   *- Visualization of data includes (but is not limited to) things like Gene orientation diagrams, Circos plots, graphs, networks, heatmaps, geographical maps showing locations, phylogenetic trees, Hi-C Contact maps, Venn diagrams, alignment maps, karyotype diagrams, etc.

Which data files should I include in my dataset? 

A dataset is essentially a collection of files and/or links to externally hosted items that are all related to a particular "unit of work", usually a manuscript. It should be noted that GigaDB is designed to host "supporting data", it is NOT a "supplemental file" server.

Since the exact details of what files are required to make up a dataset is highly dependent on the "unit of work" to which they relate, we can only provide generic guidelines, but our curators are always on hand for specific queries that you may have.

We would expect to see all data and scripts used in the unit of work. This should be sufficient to enable full reproducibility and transparency of the unit of work in conjunction with the published methods (either in a manuscript or in protocols.io) and openly available software tools.

In addition, when uploading your files you should always include a list of md5sum values for all files uploaded so that we can confirm file integrity after transfer and a readme.txt file listing all files with a brief (upto 200 chars) description of each. Note - the readme file will be reformatted into a standardized GigaDB format by the GigaDB team.

We have prepared guidelines for the more common dataset types, please follow these instructions on how to make your data available http://gigadb.org/site/guide

 

File naming conventions:

This should be the full name of the file including relative file path.

- Ideally the filename should be meaningful in someway, if appropriate it may contain reference to a particular point with the associated manuscript, but usually that reference would be expected in the file descriptions. (e.g. "gene-expression-fig1.csv" would be better than "Fig1.csv") 

- Full file-path names must be unique within the dataset.

- Filenames should only include the following characters a-z,A-Z,0-9,_,-,+,. 

- Filenames should not include spaces, we recommend using the underscore (_) in place of spaces.

- All files should be machine readable (e.g. No PDF, Excel or Word documents)

- The file extension should be relevant to the format of the file, e.g. csv tabular data should have the file extension ".csv".

See here for a list of file types and regular file extensions.

 

How to write the file descriptions:

We expect the description to include what the content of the files is, and if appropriate where in the associated unit of work it relates to, e.g. "Abundance matrix of specific OTUs in samples analysed. Displayed in Figure 2 of associated manuscript.". You MUST avoid the use of carriage returns within the description.

File compression requirements:

It is normal to compress large files to reduce transfer times, and we encourage the use of gzip or bzip2 for this. Zip can be acceptable, but for preference if possible use gzip or bzip2 for individual files. Where a file has been compressed there should be two file extensions consecutively e.g. "filename.fasta.gz".

In some instances it is appropriate to archive multiple files into a single archive file, for this we strongly recommend the use of tar with or without the addition of gzip. Similarly the file extension will reflect this, e.g. directoryName.tar or directoryName.tar.gz 

For tar files we allow for longer descriptions as there could be more required due to the potential for more varied content. You MUST avoid the use of carriage returns within the description.

How to handle data hosted in repositories outside of GigaDB:

Data published in external repositories (e.g., Zenodo, GitHub, Dryad, and INSDC members) must include a link to that data within the GigaDB dataset. Below is a list of a few of the commonly used data repositories for different data types and how we expect those to be represented in a GigaDB dataset.

  • GitHub (for software) 
  • Dryad 
  • Zenodo  
  • DataOne
  • Dataverse
  • OSF 
  • FigShare
  • Mendeley Data 
  • INSDC databases (SRA, etc.) 
  • ProteomeXchange (PRIDE, etc.)
  • MetabolomeXchange (Metabolights, etc.)
  • NeuroVault (OpenfMRI, etc.)

Institutional repositories - Where your institute hosts its own repository we will assess these on a case-by-case basis, but as a rule of thumb, if they assign a DOI and have a clear open license it is highly likely to be acceptable to us.

Other - please contact database@gigasciencejournal.com to discuss any other repositories you may wish to use.