Genome assembly and annotation of the Sharp-nosed Pit Viper Deinagkistrodon acutus based on next-generation sequencing data

The study of the currently known >3,000 species of snakes can provide valuable insights into the evolution of their genomes. Deinagkistrodon acutus, also known as Sharp-nosed Pit Viper, one hundred-pacer viper or five-pacer viper, is a venomous snake with significant economic, medicinal and scientific importance. Widely distributed in southeastern China and South-East Asia, D. acutus has been primarily studied for its venom. Here, we employed next-generation sequencing to assemble and annotate a highly continuous genome of D. acutus. The genome size is 1.46 Gb; its scaffold N50 length is 6.21 Mb, the repeat content is 42.81%, and 24,402 functional genes were annotated. This study helps to further understand and utilize D. acutus and its venom at the genetic level.

is 1.46 Gb; its scaffold N50 length is 6.21 Mb, the repeat content is 42.81%, and 24,402 functional genes were annotated.This study helps to further understand and utilize D. acutus and its venom at the genetic level.

CONTEXT
Deinagkistrodon acutus is a species of venomous pit viper, a member of the suborder Ophiopodes and the Viperidae family.It is commonly known as the Sharp-nosed Pit Viper, as well as hundred-pacer viper, five-pacer viper, Chinese moccasin, and Long-nosed Agkistrodon (Figure 1) [1,2].Mainly acting in the lungs, D. acutus venom is predominantly hemotoxic and can lead to abnormal coagulation and promote tissue damage, edema and acute renal failure, among other reactions [3].D. acutus is widely distributed in southeastern China, Laos and northern Vietnam, and has significant commercial and medicinal value due to its large body size and venom [4,5].At present, research is mainly focused on the toxic components of the venom, the analysis of the symptoms of patients bitten by D. acutus.Also, its utilization of venom is studied, such as the in vitro antibacterial, antithrombotic and anticoagulant activity of specific venom proteins [6][7][8][9].High-quality genomes facilitate the discovery of genes associated with the snake's venom, which in turn can help researchers better understand and utilize the diverse bioactivities of the venom.
Based on next-generation sequencing data, our study assembled and annotated the genome of D. acutus.Our research provides essential data support for the discovery and utilization of genes related to snake venoms, and to understand better the phylogeny and evolution of snakes.Protocol collected from protocols.iofor sequencing snake genomes [10].https://www.protocols.io/widgets/doi?uri=dx.doi.org/10.17504/protocols.io.5jyl8j6e9g2w/v2Next, these results were input into GCE with the heterozygous mode (k-mer depth peak of 21) to evaluate genome size, heterozygosity and other parameters [11].

MATERIALS AND METHODS Sample collection and sequencing
The stLFR data were used to generate the genome assembly using Supernova (v2.1.1,RRID:SCR_016756).To make the assembled sequences more complete, we used GapCloser (v1.12-r6,RRID:SCR_015026) and the WGS sequencing data to fill gaps.Also, to remove redundant sequences from the genome, we used redundans (v0.14a) [12].The final genome was obtained using the method described in Figure 2. We used de novo prediction and homology-based approaches to identify the repetitive regions in the genome assembly.The homology-based prediction was performed using Blastall (v2.2.26) [12].Specifically, we mapped the protein sequences from the UniProt database (release-2020_05) of Pseudonaja textilis, Crotalus tigris, Thamnophis elegans and Notechis scutatus to the D. acutus genome assembly.Annotation and assessment were performed according to the protocol described by Liu et al. [10].

DATA VALIDATION AND QUALITY CONTROL
The LINEs and LTRs contents were 29.53% and 11.99%, respectively (Table 3).Repeated  sequences are important for the self-replication of genetic information, and are closely related to the inheritance and variation of species.
A total of 24,402 functional genes were annotated (Table 4).The results of our gene ontology (GO) enrichment analysis showed that the functional genes of our genome are enriched in biological processes (BP), cellular components (CC) and molecular functions (MF).Among them, cellular process, membrane and binding have the highest content in BP, CC and MF.Our KEGG pathway enrichment analysis using functional genes showed that signal transduction-related genes are crucial in D. acutus (Figure 4).Also, the largest number of enriched pathways are related to metabolism.The phylogenetic tree we generated (Figure 5) shows that our data can be used for building species phylogenetic trees.Our tree is consistent with the current knowledge on snake genomes [14].By   comparing our assembled genome data to the chromosome-level genome data of D. acutus [1], we demonstrated the successful assembly and annotation of a highly continuous genome of D. acutus.

REUSE POTENTIAL
Our data can be used as a reference genome for others to study D. acutus.In addition, it can be used in conjunction with other snake genomes to study the phylogeny and evolution of snakes.Finally, our genome provides data supporting research on snake venom and related toxicology studies.

Figure 1 .
Figure 1.An individual of D. acutus photographed by Diancheng Yang.

A
specimen of D. acutus (NCBI:txid36307) weighing 781 g was obtained from Huangshan City, in Anhui (China), for genome assembly and annotation.The liver, stomach, kidney and muscle tissues were collected for RNA extraction.Additionally, two other muscle tissues were taken for DNA extraction before Whole Genome Sequencing (WGS) and single-tube long fragment read (stLFR) sequencing.We extracted the D. acutus DNA, constructed the library and performed paired-end sequencing according to the protocol described by Liu et al. (Figure 2) [10].Sample collection and experimental procedures were approved by the Institutional Review Board of BGI (BGI-IRB E22017).

Figure 3 .
Figure 3. Distribution of transposable elements (TEs) in the D. acutus genome.The TEs include DNA and RNA transposons (i.e., DNAs, LINEs, LTRs and SINEs).(a) Divergence rate distribution of the de novo sequences.(b) Divergence rate distribution of known sequences.

Table 1 .
Genome assembly data relative to the D.acutus genome assembled in this study.

Table 2 .
Statistics for repetitive sequences in the D. acutus genome.

Table 3 .
Statistics for the repetitive sequences (de novo) from our D. acutus genome.

Table 4 .
Functional annotation result of our D. acutus genome.