The genome assembly and annotation of the white-lipped tree pit viper Trimeresurus albolabris

Trimeresurus albolabris, also known as the white-lipped pit viper or white-lipped tree viper, is a highly venomous snake distributed across Southeast Asia and the cause of many snakebite cases. In this study, we report the first whole genome assembly of T. albolabris obtained with next-generation sequencing from a specimen collected in Mengzi, Yunnan, China. After genome sequencing and assembly, the genome of this male T. albolabris individual was 1.51 Gb in length and included 38.42% repeat-element content. Using this genome, 21,695 genes were identified, and 99.17% of genes could be annotated using gene functional databases. Our genome assembly and annotation process was validated using a phylogenetic tree, which included six species and focused on single-copy genes of nuclear genomes. This research will contribute to future studies on Trimeresurus biology and the genetic basis of snake venom.


INTRODUCTION
Trimeresurus albolabris, also known as the white-lipped pit viper, white-lipped tree viper, white-lipped bamboo pit viper, and green tree pit viper, is a venomous snake species belonging to the family Viperidae [1].It is a relatively small snake, with adults typically measuring around 70-90 cm in length, and is known for its distinctive appearance, with a white stripe running down the center of its upper lip [2] (Figure 1).This species has been reported in China, Vietnam, Thailand, Laos, Cambodia, India, Bangladesh, Myanmar, and West Java and has become one of the most common venomous snakes with medical importance in Southeast Asia [3].T. albolabris is a highly venomous snake.Its bite can be dangerous to humans, causing symptoms ranging from pain and swelling to more severe ones, such as shock, spontaneous bleeding, defibrination, and other complications of thrombocytopenia and leukocytosis [4,5].Notably, the venom of T. albolabris contains metalloproteinases [6,7], a thrombin-like enzyme [8], and other venom components [5,9]. of this species is crucial for studying venom proteomics, particularly for drug discovery, developing antivenom therapies, and understanding the evolution of venomous species [12][13][14].However, a complete genome of T. albolabris has not been published yet [15].

Z. Niu et al.
Here, we report the first whole genome with high continuity of a male T. albolabris individual, collected from Mengzi, Yunnan, China.The genome was generated using single-tube long fragment read (stLFR) [16] and whole genome sequencing (WGS) technologies.Our T. albolabris genome had a repeat element content of 38.42% and a total size of 1.51 Gb.This new genome assembly provides valuable evidence for future studies on snake venom and the genetic underpinnings of the Trimeresurus species.

METHOD
The detailed stepwise protocols used in this study are gathered in a protocols.iocollection, with the minor adaptations outlined below (Figure 2) [17].

Sample collection and sequencing
A male T. albolabris sample was captured in Mengzi, Yunnan, China.To preserve its quality, this specimen was frozen in dry ice (at −80 °C) immediately after collection and identification, both for storage and transportation.The protocols we used for DNA extraction, library construction, and sequencing can be found in a protocols.ioprotocol collection [17].The heart, stomach, liver, and kidneys were used for RNA sequencing.
Additionally, a muscle sample was used for stLFR and WGS sequencing.The genome assembly and annotation workflow is also included in the same protocols.ioprotocol [17].
This study, including sample collection, experimental procedures, and research design, was approved by the Institutional Review Board of Beijing Genomics Institute (BGI-IRB E22017).Throughout the research, meticulous adherence to the guidelines established by BGI-IRB was strictly followed, ensuring compliance with ethical and regulatory standards.A protocols.iocollection of protocols for sequencing snake genomes [17].https://www.protocols.io/widgets/doi?uri=dx.doi.org/10.17504/protocols.io.4r3l27ez4g1y/v1

RESULTS
This study on snake genomics resulted in a total of 387.48 Gb of paired-end (fastq 1 and fastq 2) data, which comprised 204.61 Gb of short reads data obtained through WGS sequencing and 182.87 Gb of long reads data obtained through stLFR sequencing, as shown in Tables 1 and 2.
We generated the first whole genome assembly of T. albolabris with high continuity, with a total genome size of 1.51 Gb, 39.97% GC content, and a scaffold N50 length of 381.55 kb (Table 3).The assembled T. albolabris genome consists of 10,016 contigs over 1,000 base pairs, with a total length of 1.50 Gb, accounting for 99.14% of the genome's total length.This   We detected repetitive elements in the T. albolabris genome, accounting for 38.42% of the total genome.Among them, the highest proportion was occupied by long interspersed nuclear elements (LINEs), which accounted for 23.94% and amounted to approximately 362.35 Mb.These findings were found to be highly similar to the repetitive element content observed in previously sequenced genomes, such as those of Thamnophis elegans (42.02%) (accession No. PRJNA561996) and Crotalus tigris (42.31%) [36].This indicates that the results we obtained are highly reliable and plausible.The remaining types of transposable elements, including DNA transposons, long terminal repeats (LTRs), and short interspersed nuclear elements (SINEs), accounted for 6.90%, 5.83%, and 1.24%, respectively (Figure 3, Tables 4, and 5).
Using homology-based, de novo, and RNA-sequencing annotation methods, we successfully identified 21,695 protein-coding genes in our T. albolabris genome assembly.
We compared our assembly to those of Notechis scutatus (GCA_900518725.1),Pseudonaja textilis (GCA_900518735.1),and Thamnophis elegans (GCA_009769535.1),all of which are available from the NCBI database.Our analysis revealed no significant differences in the  distribution of transcript mapping lengths, coding sequences (CDS) lengths, or the quantity of exons and introns.Additionally, our analysis predicted the presence of 250 miRNAs, 179 tRNAs, and 301 snRNAs within the T. albolabris genome (Table 6).
Further analyses using KEGG enrichment revealed that Environmental Information Processing, Organismal Systems, and Metabolism pathways were the most abundant, with Signal Transduction pathways being the most prominent.Among the Organismal Systems pathways, 1,774 Immune System genes and 1,551 Endocrine System genes were the most abundant (Figure 4a).In addition, based on the results of our GO analysis, we found that 7,900 genes are related to binding, while 7,740 genes are related to cellular processes (Figure 4b).

DATA VALIDATION AND QUALITY CONTROL
We employed BUSCO v5.2.2 to assess the quality and completeness of our genome assembly [40].The results of our BUSCO analysis revealed that our assembly achieved 85.3% completeness when evaluated against the vertebrata_odb10 database (Figure 5), indicating that our assembly is of relatively high quality and completeness.
To assess the quality of our assembly, we constructed a phylogenetic tree using the    phylogenetic tree is consistent with previous research, indicating that our data can accurately identify related species (Figure 6).

REUSE POTENTIAL
We presented the first genome assembly of the white-lipped tree pit viper.This data provides new resources for studying the vipers biology and evolution, as well as the genetic foundation of its venom.

Figure 3 .
Figure 3. Distribution of transposable elements (TEs) in our T. albolabris genome.The TEs include DNA transposons (here indicated as DNA) and RNA transposons (i.e., DNAs, LINEs, LTRs, and SINEs).(a) Distribution of the de novo sequence divergence-rate.(b) Distribution of the known sequence divergence-rate.
protein sequences of seven different amphibian and reptile species (Anolis carolinensis, Chelonia mydas, Deinagkistrodon acutus, Ophiophagus hannah, Python bivittatus, Xenopus tropicalis, and Alligator mississippiensis) as well as the protein sequences of Gallus gallus, Homo sapiens, Mus musculus, and Danio rerio downloaded from NCBI.The resulting Gigabyte,

Figure 6 .
Figure 6.Phylogenetic tree reconstructed using nuclear genome single-copy genes.The numbers on the branches of the phylogenetic tree represent the branch lengths obtained using OrthoFinder.

Table 2 .
Summary statistics of T. albolabris stLFR and RNA sequenced reads.

Table 3 .
Summary of the features of the T. albolabris genome.

Table 4 .
Statistics of the repetitive sequences identified in our T. albolabris genome.

Table 5 .
Summary of the TEs in our T. albolabris genome.

Table 6 .
Statistics for the miRNA, tRNA, rRNA, and snRNA discerned from our T. albolabris genome.

Table 7 .
Consequences of gene functional annotation.