The genome assembly and annotation of the Oriental rat snake Ptyas mucosa

The Oriental rat snake Ptyas mucosa is a common non-venomous snake of the colubrid family, spanning most of South and Southeast Asia. P. mucosa is widely bred for its uses in traditional medicine, scientific research, and handicrafts. Therefore, genome resources of P. mucosa could play an important role in the efficacy of traditional medicine and the analysis of the living environment of this species. Here, we present a highly continuous P. mucosa genome with a size of 1.74 Gb. Its scaffold N50 length is 9.57 Mb, and the maximal scaffold length is 78.3 Mb. Its CG content is 37.9%, and its gene integrity reaches 86.6%. Assembled using long-reads, the total length of the repeat sequences in the genome reaches 735 Mb, and its repeat content is 42.19%. Finally, 24,869 functional genes were annotated in this genome. This study may assist in understanding P. mucosa and supporting medicinal research.


INTRODUCTION
Known as the Oriental rat snake (Figure 1) [1], Indian rat snake, or Dhaman, Ptyas mucosa is a common non-venomous species of colubrid snakes.There are over 300 genera and 2,000 species in the colubrid family, making it the largest snake family [2].While an excitable and fast-moving snake, the rat snake is harmless to humans, preying upon small reptiles, birds, and mammals.Therefore, in some areas, farmers obtain the Oriental rat snake from other locations to catch mice and protect their crops.Adult snakes usually prefer to subdue their prey by sitting on it instead of constricting it, using their weight to overpower it, a hunting mechanism for capturing prey seldom observed in other snake species [3].When threatened, Oriental rat snakes inflate their necks, imitating the king cobra or Indian cobra to scare potential predators [4].
In southern China, the Indian rat snake is commonly eaten by humans, and its skin is used for making the membranes of a traditional musical instrument, the erhu [5].
Traditional Chinese medicine uses its gallbladder to prepare a medicinal wine for treating many diseases [6].In the past, due to overhunting, its number was significantly reduced; however, artificial breeding succeeded in gradually recovering their number [6].In this study, we present a highly continuous genome of P. mucosa with a genome size of 1.74 Gb.The genome was generated using single-tube long fragment reads (stLFR) sequencing data, combined with whole genome sequencing data for correction.
Its repeat content reached 42.19%.This genome is an important basis for follow-up studies elucidating the biology of P. mucosa.In particular, high-quality reference genome and transcriptome data can provide effective help for subsequent targeted breeding.

MAIN CONTENT Context
In this study, we present a highly-continuous genome assembly of P. mucosa.The maximum genome size is 1.74 Gb.The length of scaffold N50 is 9.57 Mb, and the maximal length of the scaffold is 78.3 Mb (Table 1).Furthermore, our P. mucosa genome has a CG content of 37.9% and, using BUSCO (v5.2.2; RRID:SCR_015008) (Figure 2), we found that its integrity reaches 86.6%.Thus, according to these genome assembly data, this is a highly contiguous genome.
Here, we report the draft reference genome sequence of P. mucosa.This data will be a valuable resource in the study of non-poisonous snakes.

Methods
Detailed stepwise protocols are gathered in a protocols.iocollection with the minor adaptations outlined below [7] (Figure 3).

Sample collection and sequencing
In 2021, an adult P. mucosa (NCBI:txid31142) individual from Hezhou City in the Guangxi province of China was collected for genome assembly and RNA sequencing.The snake was identified as P. mucosa by morphological identification.The individual died of natural causes and its samples were transferred to dry ice, quickly frozen, and kept at −80 °C until further use.We isolated eight tissues and organs for RNA sequencing, including the heart, the small intestine, the large intestine, the lung, the liver, the stomach, the kidney, and the muscles.Furthermore, genomic DNA was extracted for whole-genome sequencing utilizing the AxyPrep genomic DNA kit (AxyPrep, USA).The total RNA was isolated utilizing the TRlzol reagent (Invitrogen, USA) following the recommended guidelines.The RNA quality, purity, and quantity were assessed using a Qubit 3.0 fluorometer (Life Technologies, USA) and an Agilent 2100 Bioanalyzer System (Agilent, USA).The cDNA libraries were generated through the reverse transcription of RNA fragments ranging from 200 to 400 bp.In addition, the liver sample was used for stLFR sequencing and genome survey.The latter refers to methods for analyzing second Figure 3.A protocols.iocollection of the standard protocols for sequencing snake genomes [7].https://www.protocols.io/widgets/doi?uri=dx.doi.org/10.17504/protocols.io.5jyl8j6e9g2w/v2 generation sequencing data through k-mer to obtain genome size, heterozygosity, repeat sequence proportion, GC-content, and other genomic information.

Genome survey, assembly, annotation, and assessment
The stLFR sequencing data were assembled using Supernova (v2.1.1,RRID:SCR_016756) [8].NextPolish (v1.0.5) [9] was then used to perform a second round of correction and a third round of polishing of this assembly using the Whole Genome Sequencing (WGS) data.To get a haploid representation of the genome, duplicates were purged from the genome using the purge_dups pipeline (RRID:SCR_021173) [10].The completeness of the genome was evaluated using sets of BUSCO (v5.2.2) with genome mode and lineage data from vertebrata_odb10 [11].

Results
In P. mucosa, the total length of the repeat sequence in the genome reaches 735 Mb, and its repeat content is as high as 42.16% (Tables 2 and 3).We analysed the content of various repetitive elements, and several different genome families were identified within the P. mucosa genome.We found that long interspersed nuclear elements (LINEs) accounted for 35.51%, long terminal repeat (LTR) accounted for 9.15%, and DNA accounted for 4.66%  (Figure 4).Hence, LINEs were the most frequent repeats.Despite snake species sharing similar genome sizes, research findings demonstrated considerable variations in transposable element (TE) content, with limited diversity in the types of TEs.In particular, species with a longer evolutionary history tend to exhibit greater diversity in TE content, as indicated by research findings.
A total of 24,869 functional genes were annotated using KEGG.This showed the highest number of annotated genes in pathways related to Human Diseases, Organismal Systems, and Metabolism.The highest number of Signal Transduction genes were found in Environmental Information Processing.Moreover, our GO gene enrichment for P. mucosa revealed that, among 25 biological process pathways, 247 genes related to immune system processes, and two genes related to detoxification (Figure 5).

Figure 1 .
Figure 1.A picture of a P. mucosa individual, by Probophilic CC0 Wikimedia commons.

Figure 2 .
Figure 2. BUSCO assessment result of our P. mucosa genome.

Figure 5 .
Figure 5. Gene annotation information of P. mucosa.(a) KEGG enrichment of P. mucosa.(b) GO enrichment of P. mucosa.

Table 1 .
Summary of the features of our P. mucosa genome.

Table 2 .
Statistics for the repetitive sequences identified in our P. mucosa genome.

Table 3 .
Summary of TEs in our P. mucosa genome.