Published online : 1 July 2020
Article Outline
Scroll to top
Editorial
GigaByte: Publishing at the Speed of Research
 Views 193
 Downloads 16
Cite this article as... 

Scott C. Edmunds, Laurie Goodman,  GigaByte: Publishing at the Speed of Research, Gigabyte, 1, 2020  https://doi.org/10.46471/gigabyte.1

 Copy citation

Current practices in scientific publishing are unsuitable for rapidly changing fields and for presenting updatable data sets and software tools. In this regard, and as part of our continuing pursuit of pushing scientific publishing to match the needs of modern research, we are delighted to announce the launch of GigaByte, an online open-access, open data journal that aims to be a new way to publish research following the software paradigm: CODE, RELEASE, FORK, UPDATE and REPEAT. Following on the success of GigaScience in promoting data sharing and reproducibility of research, its new sister, GigaByte, aims to take this even further. With a focus on short articles, using a questionnaire-style review process, and combining that with the custom built publishing infrastructure from River Valley Technologies, we now have a cutting edge, XML-first publishing platform designed specifically to make the entire publication process easier, quicker, more interactive, and better suited to the speed needed to communicate modern research.

Gigabyte
Gigabyte
2709-4715
GigaScience Press
Sha Tin, New Territories, Hong Kong SAR
Future’s Past
In 2012, we launched GigaScience [1] as a new type of journal — one that provides standard scientific publishing linked directly to a database that hosts all the relevant data. Aiming to address Buckheit and Donoho’s 1995 complaint that “an article about computational results is advertising, not scholarship. The actual scholarship is the full software environment, code and data that produced the result.” [2], we were inspired to launch a new type of journal that focussed on making sure the actual scholarship of all types of research was included with the article narrative.
Eight years on we have achieved many of these aims. With data publishing becoming more mainstream as the major publishers have followed our lead, and the journal receiving awards (e.g. the 2018 Prose Awards Winner for “Innovation in Journal Publishing”) [3] and other forms of recognition for our past efforts, we are hungry to do more. One of our biggest frustrations and a major component of what has held us back has been the legacy publishing infrastructure, which, for the most part, still incorporates print-based production processes, and has a decades old codebase. It is no longer fit for purpose in this non-print, online, more data-centric digital age. When carrying out collaborations and integrations with the growing ecosystem of open science platforms, trying to make changes in scientific publishing always had to be ad hoc and shoehorned into legacy publishing workflows. Any technical changes and updates to the system are typically slow and buggy, and have issues with scalability. More, making changes is very expensive, and thus, to make business sense, changes have to be useful for a wide number of journals — all with different needs. Modern, web-literate platforms dominate other fields, so when we are publishing modern, web-literate research outputs, we should publish them in the same manner. The vast majority of publishing platforms remain highly focussed on static, hand-typeset PDFs rather than modern, digital first, XML- (Extensible Markup Language) based technologies.
An inspirational spark came to us from GigaScience Editorial Board Member Carole Goble, who won the FORCE11 Vision Prize in 2013 with her proposal “Don’t Publish. Release!” [4]. She proposed applying a 21st century software-release-style paradigm instead of an 18th century print-a-book paradigm to scholarly communication. And thus the germ of the idea that would eventually spawn GigaByte was set in motion.
Among the winners of the 2015 FORCE11 Vision Prize was Kaveh Bazargan from River Valley Technologies who presented “Why are we so Attached to Attachments? Let’s Ditch Them and Improve Publishing” [5]. In this Kaveh presented embryonic ideas that would eventually becoming RVT’s end-to-end publishing solution, which includes manuscript submission, content management, and hosting using collaborative online platforms (Figure 1). Sensing this was exactly what we needed to meet our open science objectives, it sparked a collaboration to develop a publishing process that, in addition to providing on-the-fly article production, would create more interactive articles that can be versioned and forked, thereby following the “Don’t Publish. Release!” concept.
Figure 1.
Recording and slides of Kaveh Bazargan’s talk “Why are we so attached to attachments? Let’s ditch them and improve publishing”, presented in the “Vision” track at FORCE2015 in Oxford on 13th January 2015 https://youtu.be/aFzRVqTNi-8
Our collaboration with RVT, with new publishing strategies and technologies, aims to address as many of these bottlenecks as possible — and, moving forward, allows us to evolve, fine tune, and reduce publishing time and cost without all the heavy lifting needed to change current publishing processes.
Three, Two, One, Launch!
For GigaByte, we are launching with two article types: Data Release and Technical Release articles, focussed respectively on datasets and software/computational workflow papers, as these are the most obvious Research Objects that require the iterative form of the CODE, RELEASE, FORK, UPDATE and REPEAT approach. The assessment of the work in these articles will focus solely on whether the information would be usable to people in both broad and specialist communities, that the work is scientifically sound, and that all associated Research Objects are open, accessible, and follow best (FAIR) practices for sharing [6].
The next step in this iterative pathway will be addressed with an article type we plan to release in the future, namely an Update article. This article type will do two things. First, allow publication of substantive additional data and software versions that are immediately useful to the community but would typically have a much-delayed release or indeed might never be released because the authors need to carry out and add analyses to make these “publishable” in the current publishing paradigm. Second, greatly reduce the time to write and peer-review articles as the majority of the narrative (e.g. the Introduction and similar sections) of an Update does not change in any substantive way and can be linked instead of rewritten, allowing reviewers to focus only on the added information in assessing the manuscript. This type of article is made possible by leveraging the ability of River Valley’s XML-first platform to near instantaneously change and produce content on the fly. Watch this space for more information on this new article type.
Work Around It or Break Through It
With GigaScience, we pushed to break barriers in transparency, reproducibility, and data and tool accessibility: these are our foundational principles. GigaByte, on top of having no obstacles to access article text, supporting data and code, plans to break through even more barriers to continue to move toward truly open science. As a medium built for the web rather than one to replace dead trees, GigaByte aims to move even further beyond the static PDF. To get there, we are collaborating with River Valley Technologies, whose custom-built hosting platform is flexible and can handle myriad widgets and dynamics. We are encouraging submitting authors to include dynamic features and widgets in their papers to allow greatest interaction with their work. If we are not currently able to do it, we and our partner River Valley Technologies are ready and excited to investigate, explore, and evolve to make this possible. We will, of course, be continuing to carry out open and transparent peer review as standard [7], and peer reviewers will be credited for their hard work with DataCite DOIs that they can display in their online CV and ORCID profiles [8].
Adding to this, we are looking for ways to make the review process more streamlined, to allow reviewers —who all have limited time— to focus primarily on the specific points we want assessed. To do this we will be using a questionnaire style review, but these will also include optional commenting areas so reviewers are not blocked by a static questionnaire from giving their much-desired additional thoughts. Post-publication peer review and further interaction is encouraged through Hypothes.is integration, for collaborative annotation of our content. So readers don’t just have to passively consume information; they can actively connect and update it.
Another barrier to maximising the utility of scientific research to the international community is the lack of focus and tools to aid the reader of the articles. As a clear example, language and jargon issues limit the ability of researchers not fluent in English to engage fully in the international scientific arena and citizen scientists to be able to contribute and provide perspectives not constrained by the current scientific preconceptions. The dynamic River Valley Technologies platform allows integrated language support. It also includes options for readers to select how they want to view papers, including, for example, the availability to view the article in a dyslexic-friendly font. Demonstrating the potential for novel ways of viewing content built on the open XML, we also showcase the open source eLife Lens manuscript viewer, ensuring full use of the highly structured XML files, and allowing you to view relevant content in side-by-side panes [9]. We will be continuing to identify barriers to speed and dynamic presentation in scientific publishing and finding new ways to improve the reader’s interaction with the content.
A final barrier… the cost of publishing. Time is money, as they say, and reducing the time from research to publication is a major goal of ours. But, while the advent of open access publishing has eliminated barriers to accessing information, it has not eliminated the cost of publishing. To be sustainable, open access publishers have moved to having Article Processing Charges (APCs) to cover costs. In moving to open access, publishers have eliminated a major barrier to information, but another barrier has been erected: the ability for some researchers to be able to afford to publish. While we cannot eliminate our costs, we are working hard to make GigaByte APCs as low as possible, and certainly well under the current mean APC in Europe of €1,975 (∼$2160) [10]. A major cost of doing business in publishing goes into the production of articles, which is typically a manual process to layout and create PDF and XML formats of an article; a lot of this work is still wedded to the print process. The RVT publishing platform has a fully automated production process, enabling us to eliminate that cost and pass that savings to authors. Additionally, our aim is only to cover costs: not to take advantage of grant funds that are aimed at promoting research, and not to make a profit for any business or investor. Following the FAIR Open Access Principles we will also provide transparency on our costs so that authors and granting organizations have a clear understanding of what an APC covers [11]. For authors looking for the greatest savings, that time is now — as there are no APCs for the first 6 months.
Great fleas have little fleas upon their backs to (Giga)byte ’em
With the launch of GigaByte and our formal call for papers, we include some articles to give people an idea of the types of functionality our platform provides in addition to the text, and links to data and source code. An example of our new interactive approach to publishing is in our launch Data Release: “Data for 3D Printing Enlarged Museum Specimens for the Visually Impaired”. This presents enlarged museum specimens that were 3D printed for various interactive exhibits at the National Museum in Bloemfontein, South Africa [12]. In describing the data production and re-use potential, the digitised versions of these interactive museum exhibits are equally interactive for GigaByte readers, with links to downloadable 3D models in our GigaDB repository [13] and in an embedded Sketchfab window in the article that allows the reader to inspect the model and interact with it through their browser [14]. We also include links to the Thingiverse repository [15] where the 3D printing community can find and download the models and share back their adaptations. Taken from this paper, Figure 2 shows an example of a pseudoscorpion (Feaella Capensis). Fun fact: these are also known as ‘book scorpions’ as these tiny arachnids were first described by Aristotle, who likely came across them among scrolls in a library where they would have been feeding on booklice. This article’s data for these book scorpions is a brilliant virtual demonstration of how we have moved well beyond books and scrolls that are prone to an ecosystem of beasts living off them to a digital world that allows us to disseminate, recreate, and remix knowledge — free of barriers such as geography and access (and arachnophobia).
Figure 2.
Interactive sketchfab view of pseudoscorpion, Feaella Capensis. Data from du Plessis et al., GigaByte, 2020. https://sketchfab.com/3d-models/feaeallidae-feaella-capensis-47e22b7875dc40a49668fe788b5e8af2
Although digital resources are not prone to booklice, they do have their own risks with “bit rot” and other forms of digital degradation. To mitigate these, we follow best practices, using Crossfef and DataCite digital object identifiers for our textual and data content, and we are members of the CLOCKSS sustainable dark archive to ensure the long-term survival of our textual content. Our integrated GigaDB repository is undergoing CTS certification, and we insist that the data supporting our articles are hosted in trusted data repositories under CC0 public domain waivers.
Another area where we’d like to move beyond the textual narrative is how to handle presentation of methods in an article. The traditional “Materials & Methods” format isn’t the best medium for explaining detailed step-by-step methods or complicated computational pipelines. Both computational and wet-lab protocols are much better handled by workflow management systems and protocols repositories. In GigaScience, we encourage authors to make use of protocols.io to post their detailed methods and then simply cite that in the article [16]. In GigaByte, we are moving to make this even more closely linked to the article. Among our launch articles is a new frog genome [17] demonstrating where the protocols have been integrated into the protocols.io repository (Figure 3[18].
Figure 3.
Protocols.io widget for the “repetitive element annotation protocol” http://dx.doi.org/10.17504/protocols.io.bc4niyve
We hope you enjoy these first exemplar articles, and the innovations seen here are just the start of a process. Working on new infrastructure provides us with a blank canvas to adapt and make changes that were previously heavily constrained by the legacy publishing infrastructure. Future additions to our roadmap include publishing Update articles (noted above), increasing the interactive nature of our papers with more plugins, and bringing in execution and data quality checks in the data and code review process. This will be done both via automated tools and using independent validation through efforts like the CODECHECK certificate of reproducible computation first showcased in GigaScience [19].
We encourage you to contact any of the editors to begin conversations about specific needs in your research communities for promoting large-data access, sharing, use, and reuse.
Abbreviations
APC: Article Processing Charge; CTS: Core Trust Seal; DOI: Digital Object Identifier; FAIR: Findable, Accessible, Interoperable, Re-usable; RVT: River Valley Technologies; XML: Extensible Markup Language.
Author contributions
Writing original draft: S.C.E., L.G.; conceptualization: S.C.E.
Competing interests
All authors are employees of GigaScience Press and BGI.
Acknowledgements
The authors would like to thank BGI and River Valley Technologies for their support in launching the journal, and Kaveh Bazargan for feedback on this editorial.
References
1.GoodmanL, EdmundsSC, BasfordAT, Large and linked in scientific publishing. Gigascience, 2012; 1(1): doi:10.1186/2047-217X-1-1.
2.BuckheitJB, DonohoDL, WaveLab and Reproducible Research. In: AntoniadisA, OppenheimG (eds), Wavelets and Statistics. New York: Springer 1995; pp. 5581.
3.AAP. (2018): 2018 Award Winners - PROSE Awards. https://proseawards.com/winners/2018-award-winners/.
4.GobleC, (2013): Don’t publish. Release! FORCE11 Vision Prize Winner https://www.force11.org/presentation/dont-publish-release.
5.BazarganK, (2015): Why are we so attached to attachments? Let’s ditch them and improve publishing. Zeeba TV. http://zeeba.tv/why-are-we-so-attached-to-attachments-lets-ditch-them-and-improve-publishing/.
6.WilkinsonMD The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data, 2016; 3: 160018.
7.EdmundsSC, Peering into peer-review at GigaScience. Gigascience, 2013; 2(1): 1, doi:10.1186/2047-217X-2-1.
8.BazarganM, (2018): River Valley implements ORCID to give credit to reviewers. River Valley Technologies Press Release. https://rivervalleytechnologies.com/river-valley-implements-orcid-to-give-credit-to-reviewers/.
9.eLife. (2013): Seeing through the eLife Lens: A new way to view research. Inside eLife. Available at: https://elifesciences.org/inside-elife/0414db99/seeing-through-the-elife-lens-a-new-way-to-view-research.
11.Fair Open Access Alliance. The Fair Open Access principles. https://www.fairopenaccess.org/the-fair-open-access-principles/.
12.du PlessisA, ElsJ, le RouxS, TshibalangandaM, PretoriusT, Data for 3D printing enlarged museum specimens for the visually impaired. Gigabyte, 2020; 1: https://doi.org/10.46471/gigabyte.3.
13.du PlessisA, ElsJ, le RouxS, TshibalangandaM, PretoriusT, 3D printing data from enlarged museum specimens. GigaScience Database, 2019; https://doi.org/10.5524/100648.
14.du PlessisA, (2019): Feaeallidae Feaella Capensis - Download Free 3D model by GigaScience (@GigaDB) [47e22b7]. Sketchfab. https://sketchfab.com/3d-models/feaeallidae-feaella-capensis-47e22b7875dc40a49668fe788b5e8af2.
15.du PlessisA, (2019): 3D Museum Specimens - Scorpions, Pseudoscorpions, Birds, Mites by GigaScience. Thingiverse. https://www.thingiverse.com/thing:3869332.
16.TeytelmanL, StoliartchoukA, KindlerL, HurwitzBL, Protocols.io: Virtual Communities for Protocol Development and Discussion. PLOS Biol., 2016; 14: e1002538.
17.LiQ, GuoQ, ZhouY, TanH, BertozziT, ZhuY, LiJ, DonnellanS, ZhangG, A draft genome assembly of the eastern banjo frog Limnodynastes dumerilii dumerilii (Anura: Limnodynastidae). Gigabyte, 2020; 1: https://doi.org/10.46471/gigabyte.2.
18.LiQ (2020): Repetitive element annotation for the eastern banjo frog genome assembly. protocols.io. https://doi.org/10.17504/protocols.io.bgkbjusn.
19.PiccoloSR, LeeTJ, SuhE, HillK, ShinyLearner: A containerized benchmarking tool for machine-learning classification of tabular data. Gigascience, 2020; 9: doi:10.1093/gigascience/giaa026.
Gigabyte
Gigabyte
2709-4715
GigaScience Press
Sha Tin, New Territories, Hong Kong SAR
回首过去、展望未来
2012年,GigaScience [1]创刊,它作为一种新型期刊,将标准的科学出版直接与数据库相关联,存储所有相关数据。为了解决Buckheit和Donoho在1995年提出的问题:“一篇关于算法的文章只是在做广告,而非学术。真正的学术是由完整的软件环境、代码和数据及其共同产生的结果构成的” [2],我们推出了这一新型期刊,致力于确保在文章中包含各类研究真正的学术成果。
八年来,我们完成了大多数的目标。在获得各大出版商的认可后,数据出版变得越来越主流,期刊也获得了各种奖项(例如,2018年荣获 Prose“期刊出版创新奖” [3]和其他形式的表彰,均是对我们过去努力的认可,我们渴望做的更多。然而,阻碍我们前进的主要因素之一就是传统的出版基础架构,该基础架构的大多数功能仍与印刷生产流程相关联,代码库也十分陈旧,不再适合于非印刷、在线、以数据为中心的数字化时代。在与不断发展的开放科学平台生态系统进行协作和集成时,改变科学出版方式的尝试始终是临时性的,且不得不将其强加在传统出版工作流程中。在技术上更改和更新出版系统通常十分缓慢且有诸多漏洞,并且存在可扩展性问题。此外,更改系统非常昂贵,因此,从商业角度出发,对系统所做的更改必须对很多期刊都有用,虽然所有期刊的需求本就各不相同。现代的、基于网络的平台主导着其他领域,因此,我们应以相同的方式发表现代的、基于网络平台的研究成果。目前,绝大多数出版平台仍依赖于静态的、手工排版PDF,而不是基于现代的、数字优先、基于可扩展标记语言(XML)的技术。
GigaScience编辑委员会成员Carole Goble在2013年凭借“Don't Publish. Release!” [4]的提议荣获FORCE11视觉奖,她提议在学术交流中采用21世纪的软件版本模式,而不是18世纪的书籍出版模式。这一提议给我们带来了启发,最终萌生了创立GigaByte的想法。
在2015年FORCE11视觉奖的获奖者中,来自River Valley Technologies公司的Kaveh Bazargan提出了“我们为什么如此依附于附件?抛弃它们,改善出版流程” [5]。Kaveh在会上提出了一些初步设想,这些想法最终将成为RVT的端到端出版解决方案,包括投稿、内容管理以及利用合作的在线平台进行托管[图 1]。我们感觉到这正是实现开放科学目标所需要的,也促使我们合作开发了这一出版流程,新的出版平台除了可以即时发布文章外,还将发表可以版本化和派生的交互式文章,从而遵循了“Don't Publish. Release!”的理念。
图1.
该录音和幻灯片为2015年1月13日Kaveh Bazargan在牛津FORCE2015“视觉奖”上展示的报告“我们为什么如此依附于附件?抛弃它们,改善出版流程” https://youtu.be/aFzRVqTNi-8
我们与RVT的合作采用新的出版策略和技术,旨在尽最大可能解决上述瓶颈,进一步使我们能够无需对现有的出版流程进行繁琐的变更工作就能改进、微调出版系统,并减少出版时间和成本。
3,2,1,创刊!
目前,GigaByte将发表两种文章类型:数据释放(Data Release)和技术释放(Technical Release),分别发表数据集和软件/计算工作流,因为这两类文章是最显而易见需要以迭代形式(代码、版本、派生、更新和重演)发表的研究对象。对这些文章内容的评估将仅关注以下信息:该信息是否可为大众和业界所用;研究是否科学合理;所有相关的研究对象是否公开、可获取且遵循共享 [6]的最佳实践(FAIR)。
下一步要解决的是“以迭代形式发表文章”,未来我们将推出另一个文章类型,即“更新文章”(Update article)。这一类型文章可实现两个目的。第一,允许发表实质性的新增数据和软件更新,这类更新可以立刻为学界所用,但通常会延迟很久发表,甚至可能永远不会发表,因为作者需要进行分析并增加分析以使这些更新在当前的出版模式下“可发表”。第二,大大减少了文章撰写和同行评议的时间,因为更新类文章的大部分描述性内容(例如,引言和类似部分)都没有任何实质性的改变,可以与原始文章相关联而不是重写,从而允许审稿人在审稿过程中仅关注新增信息。利用River Valley的可扩展标记语言优先(XML-first)平台,可以使此类文章实现即时更改和实时生成内容。敬请关注有关该新文章类型的更多信息。
变被动迂回为主动出击
研究的透明、可重复以及数据和工具的可获取是GigaScience的基本原则,我们已突破这些阻碍。GigaByte除了无障碍访问文章、支持性数据和代码外,还将突破更多阻碍,进一步朝着真正开放科学的方向发展。作为网络媒介,而非固有体系的替代品,GigaByte旨在超越静态PDF。为了达到这个目标,我们正在与River Valley Technologies合作,该公司的定制托管平台非常灵活,可以植入各种插件和动态系统。我们鼓励作者在文章中加入动态功能和插件,以最大程度地实现互动。对于目前无法实现的功能,我们和我们的合作伙伴River Valley Technologies愿意随时进行研究、探索和改进。当然,我们将继续以进行公开和透明的同行评审为准则 [7],审稿人的辛苦付出将得到认可,他们可以在在线CV和ORCID个人资料中展示DataCite DOI [8]
除此之外,我们正在寻找使审稿流程更加简化的方法,即让时间有限的审稿人重点关注我们希望其评议的具体内容。为此,我们将采用问卷形式的审稿方式,但仍会提供可选的评论区域,从而使审稿人不因受限于固定的问卷,而无法表达其他必要的想法。通过整合Hypothes.is,我们鼓励发表后的同行评议及互动。读者可以在文章发表后对文章内容进行注释,不必被动地接受信息,而是可以主动沟通并更新信息。
对国际社会来说,最大限度地利用科学研究的另一个障碍是缺乏关注以及可以帮助读者的工具。举一个明显的例子,语言问题限制了不擅长英文的研究人员全面参与国际科学活动的能力,而专业术语使得公众科学家无法做出贡献并提出不受当前科学先入之见所束缚的观点。动态的River Valley Technologies平台提供了多语言支持,还包括供读者选择阅读方式的选项,例如,可以选择读写障碍友好的字体浏览文章。为了证明以新颖的方式查看基于开放XML的内容的潜力,我们还将展示开源eLife Lens稿件查看器,确保充分利用高度结构化的XML文件,并允许读者在并排窗口中浏览相关内容 [9]。我们将继续寻找各种阻碍,以加速和动态呈现科学出版,并寻找改善读者与内容互动的新方法。
最后一个阻碍——出版费用。正如人们所说,时间就是金钱,减少从研究到发表的时间是我们的主要目标。但是,尽管开放获取出版的出现消除了访问信息的障碍,但并没有消除发表的成本。为了可持续发展,开放获取出版商采用文章处理费(APC)来填补开支。在转向开放获取时,出版商消除了信息访问这一主要障碍,但又建立了另一个障碍:部分研究者是否有能力负担出版费用。虽然我们无法消除成本,但我们仍在努力使GigaByte的APC尽可能低,并远低于当前欧洲平均APC水平——1,975欧元(约合2,160美元) [10]。出版过程中的主要业务成本来自于文章的生产(production)环节,这一环节主要是手动排版和创建文章的PDF和XML格式;这项工作中的许多内容仍与印刷流程有关。RVT出版平台具有完全自动化的生产流程,使期刊能够消除该成本,为作者节省该项费用。此外,我们的目标仅是收支平衡:不从旨在促进科学研究的基金中获利,也不为任何企业或投资者赚钱。遵循FAIR的开放获取原则,我们还将公开出版成本,以便作者和资助机构对APC的开支有清晰的了解 [11]。对于需要节省出版费用的作者来说,现在正是发表文章的好时机——因为前6个月本刊将不收取APC。
千里之行,始于足下
随着GigaByte创刊并正式征稿,我们发表了部分文章。读者们可以籍此了解平台提供的除文本外的功能类型,以及访问链接的数据和源代码。第一篇数据释放(Data Release)文章就展示了新的交互式出版方式:“向视觉障碍人士展示博物馆标本放大模型的三维打印数据”(Data for 3D Printing Enlarged Museum Specimens for the Visually Impaired)。文章展示了通过三维打印放大后的博物馆标本,这些模型是南非布隆方丹国家博物馆的互动展品 [12]。在描述数据产出和再利用的潜力时,这些交互式博物馆展品的数字化版本对于GigaByte读者具有同样的可互动性,GigaDB存储库 [13]和文章中嵌入式Sketchfab窗口中提供了可下载三维模型的链接,使读者能够查看模型并通过浏览器进行交互 [14]。我们还提供了Thingiverse存储库的链接 [15],三维打印爱好者可以在其中查找和下载模型并共享其修改的版本。图 2出自该文章,展示了一种拟蝎(Feaella Capensis)三维模型。这些体型较小的蛛形纲动物也被称为“书蝎”,最早是亚里士多德描述的,他很可能是在图书馆的书卷中看到这些以书虱为食的动物。这些拟蝎的数据是虚拟展示的优秀范例,代表着人类超越了兵燹水火、野兽横行的蛮荒时代,进入无视地理障碍和知识获取壁垒(以及蜘蛛恐惧症等困难)地传播、复原和重组知识的数字世界。
图2.
拟蝎的交互式sketchfab浏览。数据来源:du Plessis et al., GigaByte, 2020. https://sketchfab.com/3d-models/feaeallidae-feaella-capensis-47e22b7875dc40a49668fe788b5e8af2
尽管数字资源不会被书虱啃噬,但是它们确实存在“数据衰减”和其他形式的数据毁坏风险。为了减少这些问题带来的影响,我们遵循最佳实践,赋予文本和数据内容Crossref和DataCite数字对象标识符(DOI),我们也是CLOCKSS可持续性闭架存储系统的成员,以确保文本内容的长期保藏。GigaDB存储库为CTS认证,我们坚持将文章的支持性数据存储在可信任的数据存储库中,并获得CC0豁免。
我们想要超越文本描述的另一个方向是如何呈现文章中的方法部分。传统的“材料和方法”部分无法将详细的步骤或复杂的计算过程一一展示。工作流管理系统和操作流程存储库可以更好地处理计算过程和湿实验操作流程。对于GigaScience,我们鼓励作者使用protocols.io发布其详细实验方法,然后在文章中简单引用 [16]。对于GigaByte,我们更是努力推动protocols.io的引用。在我们首批发表的文章中,新的青蛙基因组文章 [17]已将实验方法集成到protocols.io存储库中 [图 3[18]
图3.
东班卓琴蛙的重复元件注释方法,发布于protocols.io。 http://dx.doi.org/10.17504/protocols.io.bc4niyve
我们希望您喜欢这些示例文章,这里看到的创新仅仅是个开始。新的基础架构为我们提供了一块空白画布,供我们适应和改变,从而摆脱旧的出版基础架构的诸多限制。未来发展路线包括发表更新文章(Update articles,如上所述),使用更多插件增强文章的交互性,以及在数据和代码审查过程中增加管理和数据质量检查。这既可以通过自动化工具来完成,也可以通过使用独立验证来完成,诸如GigaScience中首次展示的可重复计算过程的CODECHECK证书等 [19]
若您的研究团队在促进大数据访问、共享、使用和再利用方面有任何特殊需求,欢迎您联系我们的任何一位编辑咨询。
缩略词
APC: Article Processing Charge; CTS: Core Trust Seal; DOI: Digital Object Identifier; FAIR: Findable, Accessible, Interoperable, Re-usable; RVT: River Valley Technologies; XML: Extensible Markup Language.
作者贡献
文章撰写: S.C.E., L.G.; 概念形成: S.C.E.
竞争利益
所有作者均来自GigaScience出版社和BGI。
致谢
作者感谢BGI和River Valley Technologies对创刊的支持,感谢Kaveh Bazargan对本篇社论提出的意见和建议。
References
1.GoodmanL, EdmundsSC, BasfordAT, Large and linked in scientific publishing. Gigascience, 2012; 1(1): doi:10.1186/2047-217X-1-1.
2.BuckheitJB, DonohoDL, WaveLab and Reproducible Research. In: AntoniadisA, OppenheimG (eds), Wavelets and Statistics. New York: Springer 1995; pp. 5581.
3.AAP. (2018): 2018 Award Winners - PROSE Awards. https://proseawards.com/winners/2018-award-winners/.
4.GobleC, (2013): Don’t publish. Release! FORCE11 Vision Prize Winner https://www.force11.org/presentation/dont-publish-release.
5.BazarganK, (2015): Why are we so attached to attachments? Let’s ditch them and improve publishing. Zeeba TV. http://zeeba.tv/why-are-we-so-attached-to-attachments-lets-ditch-them-and-improve-publishing/.
6.WilkinsonMD The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data, 2016; 3: 160018.
7.EdmundsSC, Peering into peer-review at GigaScience. Gigascience, 2013; 2(1): 1, doi:10.1186/2047-217X-2-1.
8.BazarganM, (2018): River Valley implements ORCID to give credit to reviewers. River Valley Technologies Press Release. https://rivervalleytechnologies.com/river-valley-implements-orcid-to-give-credit-to-reviewers/.
9.eLife. (2013): Seeing through the eLife Lens: A new way to view research. Inside eLife. Available at: https://elifesciences.org/inside-elife/0414db99/seeing-through-the-elife-lens-a-new-way-to-view-research.
11.Fair Open Access Alliance. The Fair Open Access principles. https://www.fairopenaccess.org/the-fair-open-access-principles/.
12.du PlessisA, ElsJ, le RouxS, TshibalangandaM, PretoriusT, Data for 3D printing enlarged museum specimens for the visually impaired. Gigabyte, 2020; 1: https://doi.org/10.46471/gigabyte.3.
13.du PlessisA, ElsJ, le RouxS, TshibalangandaM, PretoriusT, 3D printing data from enlarged museum specimens. GigaScience Database, 2019; https://doi.org/10.5524/100648.
14.du PlessisA, (2019): Feaeallidae Feaella Capensis - Download Free 3D model by GigaScience (@GigaDB) [47e22b7]. Sketchfab. https://sketchfab.com/3d-models/feaeallidae-feaella-capensis-47e22b7875dc40a49668fe788b5e8af2.
15.du PlessisA, (2019): 3D Museum Specimens - Scorpions, Pseudoscorpions, Birds, Mites by GigaScience. Thingiverse. https://www.thingiverse.com/thing:3869332.
16.TeytelmanL, StoliartchoukA, KindlerL, HurwitzBL, Protocols.io: Virtual Communities for Protocol Development and Discussion. PLOS Biol., 2016; 14: e1002538.
17.LiQ, GuoQ, ZhouY, TanH, BertozziT, ZhuY, LiJ, DonnellanS, ZhangG, A draft genome assembly of the eastern banjo frog Limnodynastes dumerilii dumerilii (Anura: Limnodynastidae). Gigabyte, 2020; 1: https://doi.org/10.46471/gigabyte.2.
18.LiQ (2020): Repetitive element annotation for the eastern banjo frog genome assembly. protocols.io. https://doi.org/10.17504/protocols.io.bgkbjusn.
19.PiccoloSR, LeeTJ, SuhE, HillK, ShinyLearner: A containerized benchmarking tool for machine-learning classification of tabular data. Gigascience, 2020; 9: doi:10.1093/gigascience/giaa026.