Genome assembly and annotation of the king ratsnake, Elaphe carinata

The king ratsnake (Elaphe carinata) of the genus Elaphe is a common large, non-venomous snake widely distributed in Southeast and East Asia. It is an economically important farmed species. As a non-venomous snake, the king ratsnake predates venomous snakes, such as cobras and pit vipers. However, the immune and digestive mechanisms of the king ratsnake remain unclear. Despite their economic and research importance, we lack genomic resources that would benefit toxicology, phylogeography, and immunogenetics studies. Here, we used single-tube long fragment read sequencing to generate the first contiguous genome of a king ratsnake from Huangshan City, Anhui province, China. The genome size is 1.56 GB with a scaffold N50 of 6.53M. The total length of the genome is approximately 621 Mb, while the repeat content is 42.26%. Additionally, we predicted 22,339 protein-coding genes, including 22,065 with functional annotations. Our genome is a potentially useful addition to those available for snakes.

species.As a non-venomous snake, the king ratsnake predates venomous snakes, such as cobras and pit vipers.However, the immune and digestive mechanisms of the king ratsnake remain unclear.Despite their economic and research importance, we lack genomic resources that would benefit toxicology, phylogeography, and immunogenetics studies.Here, we used single-tube long fragment read sequencing to generate the first contiguous genome of a king ratsnake from Huangshan City, Anhui province, China.The genome size is 1.56 GB with a scaffold N50 of 6.53M.The total length of the genome is approximately 621 Mb, while the repeat content is 42.26%.Additionally, we predicted 22,339 protein-coding genes, including 22,065 with functional annotations.Our genome is a potentially useful addition to those available for snakes.

DATA DESCRIPTION
The king ratsnake (Elaphe carinata) belongs to the family Colubridae and the genus Elaphe.
It is a large oviparous snake [1] found in many provinces of South-eastern China.The southern edge of its distribution area can reach northern Guangdong, Guangxi, and Taiwan, while the northern edge is in the Beijing-Tianjin area (Figure 1).The king ratsnake is also found in northern Vietnam and several Japanese islands (Ryukyu Islands, including the Senkaku Islands) [2,3].E. carinata mainly inhabits mountainous and hilly areas, and generally feeds on rodents, birds, and eggs.Its juveniles differ significantly from adults in color and size.When threatened, E.carinata can use its anal glands to secrete a foul-smelling fluid [3].King ratsnakes are farmed in many countries as an important food source as they provide a large amount of proteins [4].According to the China Red Data Book of Endangered  Animals [5], the king ratsnake is listed as a vulnerable species.The common name "king ratsnake"refers to its habit of eating other snakes, thanks to a unique protein in its blood.The non-venomous king snake exhibits a strong antagonistic effect against the venom of various poisonous snakes, including those with blood-circulating poisons (such as the bamboo leaf green snake (Trimeresurus stejnegeri) and the sharp-nosed viper (Deinagkistrodon acutus)) and neurotoxins (such as the many-banded krait (Bungarus multicinctus), one of the most lethal snakes in the world).However, the exact immune mechanism for this protection and the pathways for digesting these poisons are unknown.
The development of genome research technology has advanced the research of reptile evolution, including the origin and production of snakes and their toxins [6,7].However, limited research has been dedicated to the natural antivenoms of snakes.As snake antivenoms are the only treatments for effectively preventing or reversing the effects of snake venoms [8], the genome of the king ratsnake may provide new insight into antivenoms and aid in the study of its digestive mechanisms.
In the present study, we assembled the first highly contiguous E. carinate genome using single-tube long fragment read (stLFR) sequencing data combined with next-generation sequencing for gap filling and redundant contigs removal.The resulting genome, comparable in size to a previously sequenced corn snake Pantherophis guttatus [9] but more contiguous, is a valuable resource for future studies.For instance, it could support studies on snake evolution and venom immunity.

MAIN CONTENT Context
As a snake with a long history of captive breeding, the reproduction and the viruses carried by the king ratsnake have been well studied [10,11].However, there is insufficient research on its immune resistance and a general lack of genomic resources.Here, we provide the de novo assembly of a highly contiguous king ratsnake genome with a size of 1.56 Gb based on stLFR sequencing data.The maximal scaffold length is 49.75M, the scaffold N50 length is 6.53M, and the contig N50 is 44.05Kb, with a GC content of 40.25%.Compared to many other published snake genome sequences, the genome we assembled is highly contiguous.
Our draft genome sequence of E. carinata will be an invaluable resource for understanding snake venom resistance.

Methods
Experimental procedures used in this study and more detailed methods are available via a protocol collection hosted in protocols.io(Figure 2) [12].

Samples and ethics statement
An adult E.carinata (NCBI:txid74364) individual from Huangshan City in the Anhui province was collected for DNA and RNA sequencing.After the individual died naturally, the samples were transferred to dry ice, quickly frozen, and kept at −80 °C until further use.
For RNA sequencing, we used tissues from four organs: liver, stomach, kidney, and muscle.
However, we performed stLFR sequencing using muscle samples only.Sample collection and experimental studies were both approved by the Institutional Review Board of BGI (BGI-IRB E22017).All procedures were carried out following the guidelines of the BGI-IRB.

Nucleic acid isolation, library preparation, and sequencing
We extracted DNA according to the method described by Wang et al. [10].An stLFR co-barcoded DNA library was constructed using the MGIEasy stLFR Library Prep Kit (MGI, China).Sequencing was performed using a BGISEQ-500 sequencer.The genomic DNA kit (AxyPrep, USA) was used to isolate DNA for whole-genome sequencing (WGS).Total RNA was extracted according to the manufacturer's instructions using the TRlzol reagent (Invitrogen, USA).The integrity and concentration of DNA and RNA samples were assessed using a Qubit 3.0 Fluorometer (Life Technologies, USA) and Agilent 2100 Bioanalyzer System (Agilent, USA).Finally, we used 200-400 bp RNA fragments for reverse transcription of cDNA libraries (Table 1).

Genome
StLFR was used to generate the E.carinata genome assembly as it is a fast and cost-efective sequencing technology.After gap filling and redundant contigs removal, the total size of the genome assembly is 1.67 Gbp (Table 2).
Usually, genome-wide repetitive elements are important for eukaryotic evolution [35].In E. carinata, the content of repetitive elements in the genome accounted for 42.26%, and the total length reached 621 Mb (Tables 3 and 4).Among all repetitive elements, long interspersed nuclear elements (LINEs) accounted for 72.56%,DNA for 4.78%, and unknown types for 0.90% (Figure 3).This indicates that the content and quantity of repetitive elements is one of the sources of species differences.

Annotation
A total of 22,065 functional genes were annotated, and the annotations associated with the TrEMBL database accounted for the most significant proportion, reaching 97.92% (Table 5).
In addition, all genes were annotated with KEGG, which showed the highest number in pathways such as Human Diseases, Organismal Systems, and Metabolism, while the highest   Processing.Additionally, GO gene annotation of E. carinata revealed that, among 25 biological process pathways, 251 genes were related to immune system processes, and two genes were related to detoxification (Figure 4).Among the 288,969 genes, 280,103 (96.9% of the total) were assigned to 22,072 orthogroups.
Consistently with previous studies [37], our data can construct phylogenetic trees and cluster closely related species (Figure 6).

REUSE POTENTIAL
The king ratsnake has both nutritive and medicinal value, and the growth and development of individuals and snake eggs have been widely studied [38].However, there are insufficient studies and genomics data on its immune system.Only Sun et al. researched the development of the immune system during the embryonic stage of the king snake [39].
Our data can be combined with other snake genome data for phylogenetic studies to construct the developmental evolutionary history of snakes and other reptiles.In addition, our genomic data can provide new insights into the study of the immune system, snake venom resistance genes, and their mechanisms of action.

Figure 3 .
Figure 3. Distribution of transposable elements (TEs), such as DNA transposons (DNA) and RNA transposons in our E. carinata genome.RNA transposons include DNAs, LINEs, long terminal repeats (LTRs), and short interspersed nuclear elements (SINEs).(a) Distribution of divergence rates for de novo sequences.(b) Distribution of divergence rates for known sequences.(c) Proportion and distribution of repeating elements.

Figure 5 .
Figure 5. BUSCO assessment result of the E. carinata genome.

Figure 6 .
Figure 6.Maximum-likelihood tree reconstructed using protein sequences.The numbers represent the branch lengths.The colored squares represent bootstraps/metadata from 0.310873 to 1.

Table 1 .
Summary of our sequencing data of E. carinata.

Table 2 .
Summary of the features of our E. carinata genome.

Table 3 .
Content of various repeat sequences in our E. carinata genome.

Table 4 .
Summary of TEs in our E. carinata genome.

Table 5 .
Summary of the annotation results in our E. carinata genome.